sherlock-review

proffesor-for-testing

更新日 Yesterday

163 閲覧

その他investigationevidence-basedcode-reviewroot-causededuction

について

シャーロックレビューは、演繹的推論を用いて実装の主張を体系的に検証し、バグを調査し、根本原因分析を行う、証拠に基づくコードレビュースキルです。このスキルは、観察、推論、消去法のプロセスを通じて開発者を導き、主張された内容と実際に起きた事象の違いを明らかにします。修正の妥当性確認、セキュリティ監査の実施、パフォーマンス検証を行うのに最適なスキルです。

クイックインストール

Claude Code

推奨

プラグインコマンド推奨

/plugin add https://github.com/proffesor-for-testing/agentic-qe

Git クローン代替

git clone https://github.com/proffesor-for-testing/agentic-qe.git ~/.claude/skills/sherlock-review

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

ドキュメント

Sherlock Review

<default_to_action> When investigating code claims:

OBSERVE: Gather all evidence (code, tests, history, behavior)
DEDUCE: What does evidence actually show vs. what was claimed?
ELIMINATE: Rule out what cannot be true
CONCLUDE: Does evidence support the claim?
DOCUMENT: Findings with proof, not assumptions

The 3-Step Investigation:

# 1. OBSERVE: Gather evidence
git diff <commit>
npm test -- --coverage

# 2. DEDUCE: Compare claim vs reality
# Does code match description?
# Do tests prove the fix/feature?

# 3. CONCLUDE: Verdict with evidence
# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED

Holmesian Principles:

"Data! Data! Data!" - Collect before concluding
"Eliminate the impossible" - What cannot be true?
"You see, but do not observe" - Run code, don't just read
Trust only reproducible evidence </default_to_action>

Quick Reference Card

Evidence Collection Checklist

Category	What to Check	How
Claim	PR description, commit messages	Read thoroughly
Code	Actual file changes	`git diff`
Tests	Coverage, assertions	Run independently
Behavior	Runtime output	Execute locally
Timeline	When things happened	`git log`, `git blame`

Verdict Levels

Verdict	Meaning
✓ TRUE	Evidence fully supports claim
⚠ PARTIALLY TRUE	Claim accurate but incomplete
✗ FALSE	Evidence contradicts claim
? NONSENSICAL	Claim doesn't apply to context

Investigation Template

## Sherlock Investigation: [Claim]

### The Claim
"[What PR/commit claims to do]"

### Evidence Examined
- Code changes: [files, lines]
- Tests added: [count, coverage]
- Behavior observed: [what actually happens]

### Deductive Analysis

**Claim**: [specific assertion]
**Evidence**: [what you found]
**Deduction**: [logical conclusion]
**Verdict**: ✓/⚠/✗

### Findings
- What works: [with evidence]
- What doesn't: [with evidence]
- What's missing: [gaps in implementation/testing]

### Recommendations
1. [Action based on findings]

Investigation Scenarios

Scenario 1: "This Fixed the Bug"

Steps:

Reproduce bug on commit before fix
Verify bug is gone on commit with fix
Check if fix addresses root cause or symptom
Test edge cases not in original report

Red Flags:

Fix that just removes error logging
Works only for specific test case
Workarounds instead of root cause fix
No regression test added

Scenario 2: "Improved Performance by 50%"

Steps:

Run benchmark on baseline commit
Run same benchmark on optimized commit
Compare in identical conditions
Verify measurement methodology

Red Flags:

Tested only on toy data
Different comparison conditions
Trade-offs not mentioned

Scenario 3: "Handles All Edge Cases"

Steps:

List all edge cases in code path
Check each has test coverage
Test boundary conditions
Verify error handling paths

Red Flags:

catch {} swallowing errors
Generic error messages
No logging of critical errors

Example Investigation

## Case: PR #123 "Fix race condition in async handler"

### Claims Examined:
1. "Eliminates race condition"
2. "Adds mutex locking"
3. "100% thread safe"

### Evidence:
- File: src/handlers/async-handler.js
- Changes: Added `async/await`, removed callbacks
- Tests: 2 new tests for async flow
- Coverage: 85% (was 75%)

### Analysis:

**Claim 1: "Eliminates race condition"**
Evidence: Added `await` to sequential operations. No actual mutex.
Deduction: Race avoided by removing concurrency, not synchronization.
Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)

**Claim 2: "Adds mutex locking"**
Evidence: No mutex library, no lock variables, no sync primitives.
Verdict: ✗ FALSE

**Claim 3: "100% thread safe"**
Evidence: JavaScript is single-threaded. No worker threads used.
Verdict: ? NONSENSICAL (meaningless in this context)

### Conclusion:
Fix works but not for reasons claimed. Race condition avoided by
making operations sequential, not by adding synchronization.

### Recommendations:
1. Update PR description to accurately reflect solution
2. Add test for concurrent request handling
3. Remove incorrect technical claims

Agent Integration

// Evidence-based code review
await Task("Sherlock Review", {
  prNumber: 123,
  claims: [
    "Fixes memory leak",
    "Improves performance 30%"
  ],
  verifyReproduction: true,
  testEdgeCases: true
}, "qe-code-reviewer");

// Bug fix verification
await Task("Verify Fix", {
  bugCommit: 'abc123',
  fixCommit: 'def456',
  reproductionSteps: steps,
  testBoundaryConditions: true
}, "qe-code-reviewer");

Agent Coordination Hints

Memory Namespace

aqe/sherlock/
├── investigations/*   - Investigation reports
├── evidence/*         - Collected evidence
├── verdicts/*         - Claim verdicts
└── patterns/*         - Common deception patterns

Fleet Coordination

const investigationFleet = await FleetManager.coordinate({
  strategy: 'evidence-investigation',
  agents: [
    'qe-code-reviewer',        // Code analysis
    'qe-security-auditor',     // Security claim verification
    'qe-performance-validator' // Performance claim verification
  ],
  topology: 'parallel'
});

Related Skills

brutal-honesty-review - Direct technical criticism
context-driven-testing - Adapt to context
bug-reporting-excellence - Document findings

Remember

"It is a capital mistake to theorize before one has data." Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."

The Sherlock Standard: Every claim must be verified empirically. What does the evidence actually show?

GitHub リポジトリ

proffesor-for-testing/agentic-qe

パス: .claude/skills/sherlock-review

agenticqeagenticsfoundationagentsquality-engineering

sherlock-review

について

クイックインストール

Claude Code

ドキュメント

Sherlock Review

Quick Reference Card

Evidence Collection Checklist

Verdict Levels

Investigation Template

Investigation Scenarios

Scenario 1: "This Fixed the Bug"

Scenario 2: "Improved Performance by 50%"

Scenario 3: "Handles All Edge Cases"

Example Investigation

Agent Integration

Agent Coordination Hints

Memory Namespace

Fleet Coordination

Related Skills

Remember

GitHub リポジトリ

関連スキル

micro-skill-creator

github-code-review

code-review-quality

github-code-review