sherlock-review
について
シャーロックレビューは、演繹的推論を用いて実装の主張を体系的に検証し、バグを調査し、根本原因分析を行う、証拠に基づくコードレビュースキルです。このスキルは、観察、推論、消去法のプロセスを通じて開発者を導き、主張された内容と実際に起きた事象の違いを明らかにします。修正の妥当性確認、セキュリティ監査の実施、パフォーマンス検証を行うのに最適なスキルです。
クイックインストール
Claude Code
推奨/plugin add https://github.com/proffesor-for-testing/agentic-qegit clone https://github.com/proffesor-for-testing/agentic-qe.git ~/.claude/skills/sherlock-reviewこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Sherlock Review
<default_to_action> When investigating code claims:
- OBSERVE: Gather all evidence (code, tests, history, behavior)
- DEDUCE: What does evidence actually show vs. what was claimed?
- ELIMINATE: Rule out what cannot be true
- CONCLUDE: Does evidence support the claim?
- DOCUMENT: Findings with proof, not assumptions
The 3-Step Investigation:
# 1. OBSERVE: Gather evidence
git diff <commit>
npm test -- --coverage
# 2. DEDUCE: Compare claim vs reality
# Does code match description?
# Do tests prove the fix/feature?
# 3. CONCLUDE: Verdict with evidence
# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED
Holmesian Principles:
- "Data! Data! Data!" - Collect before concluding
- "Eliminate the impossible" - What cannot be true?
- "You see, but do not observe" - Run code, don't just read
- Trust only reproducible evidence </default_to_action>
Quick Reference Card
Evidence Collection Checklist
| Category | What to Check | How |
|---|---|---|
| Claim | PR description, commit messages | Read thoroughly |
| Code | Actual file changes | git diff |
| Tests | Coverage, assertions | Run independently |
| Behavior | Runtime output | Execute locally |
| Timeline | When things happened | git log, git blame |
Verdict Levels
| Verdict | Meaning |
|---|---|
| ✓ TRUE | Evidence fully supports claim |
| ⚠ PARTIALLY TRUE | Claim accurate but incomplete |
| ✗ FALSE | Evidence contradicts claim |
| ? NONSENSICAL | Claim doesn't apply to context |
Investigation Template
## Sherlock Investigation: [Claim]
### The Claim
"[What PR/commit claims to do]"
### Evidence Examined
- Code changes: [files, lines]
- Tests added: [count, coverage]
- Behavior observed: [what actually happens]
### Deductive Analysis
**Claim**: [specific assertion]
**Evidence**: [what you found]
**Deduction**: [logical conclusion]
**Verdict**: ✓/⚠/✗
### Findings
- What works: [with evidence]
- What doesn't: [with evidence]
- What's missing: [gaps in implementation/testing]
### Recommendations
1. [Action based on findings]
Investigation Scenarios
Scenario 1: "This Fixed the Bug"
Steps:
- Reproduce bug on commit before fix
- Verify bug is gone on commit with fix
- Check if fix addresses root cause or symptom
- Test edge cases not in original report
Red Flags:
- Fix that just removes error logging
- Works only for specific test case
- Workarounds instead of root cause fix
- No regression test added
Scenario 2: "Improved Performance by 50%"
Steps:
- Run benchmark on baseline commit
- Run same benchmark on optimized commit
- Compare in identical conditions
- Verify measurement methodology
Red Flags:
- Tested only on toy data
- Different comparison conditions
- Trade-offs not mentioned
Scenario 3: "Handles All Edge Cases"
Steps:
- List all edge cases in code path
- Check each has test coverage
- Test boundary conditions
- Verify error handling paths
Red Flags:
catch {}swallowing errors- Generic error messages
- No logging of critical errors
Example Investigation
## Case: PR #123 "Fix race condition in async handler"
### Claims Examined:
1. "Eliminates race condition"
2. "Adds mutex locking"
3. "100% thread safe"
### Evidence:
- File: src/handlers/async-handler.js
- Changes: Added `async/await`, removed callbacks
- Tests: 2 new tests for async flow
- Coverage: 85% (was 75%)
### Analysis:
**Claim 1: "Eliminates race condition"**
Evidence: Added `await` to sequential operations. No actual mutex.
Deduction: Race avoided by removing concurrency, not synchronization.
Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)
**Claim 2: "Adds mutex locking"**
Evidence: No mutex library, no lock variables, no sync primitives.
Verdict: ✗ FALSE
**Claim 3: "100% thread safe"**
Evidence: JavaScript is single-threaded. No worker threads used.
Verdict: ? NONSENSICAL (meaningless in this context)
### Conclusion:
Fix works but not for reasons claimed. Race condition avoided by
making operations sequential, not by adding synchronization.
### Recommendations:
1. Update PR description to accurately reflect solution
2. Add test for concurrent request handling
3. Remove incorrect technical claims
Agent Integration
// Evidence-based code review
await Task("Sherlock Review", {
prNumber: 123,
claims: [
"Fixes memory leak",
"Improves performance 30%"
],
verifyReproduction: true,
testEdgeCases: true
}, "qe-code-reviewer");
// Bug fix verification
await Task("Verify Fix", {
bugCommit: 'abc123',
fixCommit: 'def456',
reproductionSteps: steps,
testBoundaryConditions: true
}, "qe-code-reviewer");
Agent Coordination Hints
Memory Namespace
aqe/sherlock/
├── investigations/* - Investigation reports
├── evidence/* - Collected evidence
├── verdicts/* - Claim verdicts
└── patterns/* - Common deception patterns
Fleet Coordination
const investigationFleet = await FleetManager.coordinate({
strategy: 'evidence-investigation',
agents: [
'qe-code-reviewer', // Code analysis
'qe-security-auditor', // Security claim verification
'qe-performance-validator' // Performance claim verification
],
topology: 'parallel'
});
Related Skills
- brutal-honesty-review - Direct technical criticism
- context-driven-testing - Adapt to context
- bug-reporting-excellence - Document findings
Remember
"It is a capital mistake to theorize before one has data." Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."
The Sherlock Standard: Every claim must be verified empirically. What does the evidence actually show?
GitHub リポジトリ
関連スキル
micro-skill-creator
メタThe micro-skill-creator rapidly generates atomic, single-purpose skills optimized with evidence-based prompting and specialist agents. It produces highly focused components using patterns like self-consistency and plan-and-solve, validated through systematic testing. This makes it ideal for developers building reliable, composable workflow elements in Claude Code.
github-code-review
その他This skill automates comprehensive GitHub code reviews using AI-powered swarm coordination, enabling multi-agent analysis of pull requests. It performs security and performance analysis while orchestrating specialized review agents to generate intelligent comments. Use it when you need automated PR management with quality gate enforcement beyond traditional static analysis.
code-review-quality
その他This skill conducts automated code reviews focused on quality, testability, and maintainability, using specialized agents for security, performance, and coverage analysis. It provides prioritized, context-driven feedback for pull requests or when establishing review practices. Developers should use it to get actionable, structured reviews that emphasize bugs and maintainability over subjective style preferences.
github-code-review
その他This Claude Skill performs AI-powered multi-agent code reviews on GitHub pull requests using swarm coordination. It provides comprehensive analysis including security, performance, and quality gate enforcement with intelligent comment generation. Use it when you need automated, intelligent code review beyond basic static analysis for GitHub PR management.
