MCP HubMCP Hub
返回技能列表

sherlock-review

proffesor-for-testing
更新于 Today
133 次查看
99
21
99
在 GitHub 上查看
其他investigationevidence-basedcode-reviewroot-causededuction

关于

Sherlock-review is an evidence-based code review skill that uses deductive reasoning to systematically verify implementation claims, investigate bugs, and perform root cause analysis. It guides developers through a process of observation, deduction, and elimination to determine what actually happened versus what was claimed. This skill is ideal for validating fixes, conducting security audits, and performing performance validation.

快速安装

Claude Code

推荐
插件命令推荐
/plugin add https://github.com/proffesor-for-testing/agentic-qe
Git 克隆备选方式
git clone https://github.com/proffesor-for-testing/agentic-qe.git ~/.claude/skills/sherlock-review

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Sherlock Review

<default_to_action> When investigating code claims:

  1. OBSERVE: Gather all evidence (code, tests, history, behavior)
  2. DEDUCE: What does evidence actually show vs. what was claimed?
  3. ELIMINATE: Rule out what cannot be true
  4. CONCLUDE: Does evidence support the claim?
  5. DOCUMENT: Findings with proof, not assumptions

The 3-Step Investigation:

# 1. OBSERVE: Gather evidence
git diff <commit>
npm test -- --coverage

# 2. DEDUCE: Compare claim vs reality
# Does code match description?
# Do tests prove the fix/feature?

# 3. CONCLUDE: Verdict with evidence
# SUPPORTED / PARTIALLY SUPPORTED / NOT SUPPORTED

Holmesian Principles:

  • "Data! Data! Data!" - Collect before concluding
  • "Eliminate the impossible" - What cannot be true?
  • "You see, but do not observe" - Run code, don't just read
  • Trust only reproducible evidence </default_to_action>

Quick Reference Card

Evidence Collection Checklist

CategoryWhat to CheckHow
ClaimPR description, commit messagesRead thoroughly
CodeActual file changesgit diff
TestsCoverage, assertionsRun independently
BehaviorRuntime outputExecute locally
TimelineWhen things happenedgit log, git blame

Verdict Levels

VerdictMeaning
TRUEEvidence fully supports claim
PARTIALLY TRUEClaim accurate but incomplete
FALSEEvidence contradicts claim
? NONSENSICALClaim doesn't apply to context

Investigation Template

## Sherlock Investigation: [Claim]

### The Claim
"[What PR/commit claims to do]"

### Evidence Examined
- Code changes: [files, lines]
- Tests added: [count, coverage]
- Behavior observed: [what actually happens]

### Deductive Analysis

**Claim**: [specific assertion]
**Evidence**: [what you found]
**Deduction**: [logical conclusion]
**Verdict**: ✓/⚠/✗

### Findings
- What works: [with evidence]
- What doesn't: [with evidence]
- What's missing: [gaps in implementation/testing]

### Recommendations
1. [Action based on findings]

Investigation Scenarios

Scenario 1: "This Fixed the Bug"

Steps:

  1. Reproduce bug on commit before fix
  2. Verify bug is gone on commit with fix
  3. Check if fix addresses root cause or symptom
  4. Test edge cases not in original report

Red Flags:

  • Fix that just removes error logging
  • Works only for specific test case
  • Workarounds instead of root cause fix
  • No regression test added

Scenario 2: "Improved Performance by 50%"

Steps:

  1. Run benchmark on baseline commit
  2. Run same benchmark on optimized commit
  3. Compare in identical conditions
  4. Verify measurement methodology

Red Flags:

  • Tested only on toy data
  • Different comparison conditions
  • Trade-offs not mentioned

Scenario 3: "Handles All Edge Cases"

Steps:

  1. List all edge cases in code path
  2. Check each has test coverage
  3. Test boundary conditions
  4. Verify error handling paths

Red Flags:

  • catch {} swallowing errors
  • Generic error messages
  • No logging of critical errors

Example Investigation

## Case: PR #123 "Fix race condition in async handler"

### Claims Examined:
1. "Eliminates race condition"
2. "Adds mutex locking"
3. "100% thread safe"

### Evidence:
- File: src/handlers/async-handler.js
- Changes: Added `async/await`, removed callbacks
- Tests: 2 new tests for async flow
- Coverage: 85% (was 75%)

### Analysis:

**Claim 1: "Eliminates race condition"**
Evidence: Added `await` to sequential operations. No actual mutex.
Deduction: Race avoided by removing concurrency, not synchronization.
Verdict: ⚠ PARTIALLY TRUE (solved differently than claimed)

**Claim 2: "Adds mutex locking"**
Evidence: No mutex library, no lock variables, no sync primitives.
Verdict: ✗ FALSE

**Claim 3: "100% thread safe"**
Evidence: JavaScript is single-threaded. No worker threads used.
Verdict: ? NONSENSICAL (meaningless in this context)

### Conclusion:
Fix works but not for reasons claimed. Race condition avoided by
making operations sequential, not by adding synchronization.

### Recommendations:
1. Update PR description to accurately reflect solution
2. Add test for concurrent request handling
3. Remove incorrect technical claims

Agent Integration

// Evidence-based code review
await Task("Sherlock Review", {
  prNumber: 123,
  claims: [
    "Fixes memory leak",
    "Improves performance 30%"
  ],
  verifyReproduction: true,
  testEdgeCases: true
}, "qe-code-reviewer");

// Bug fix verification
await Task("Verify Fix", {
  bugCommit: 'abc123',
  fixCommit: 'def456',
  reproductionSteps: steps,
  testBoundaryConditions: true
}, "qe-code-reviewer");

Agent Coordination Hints

Memory Namespace

aqe/sherlock/
├── investigations/*   - Investigation reports
├── evidence/*         - Collected evidence
├── verdicts/*         - Claim verdicts
└── patterns/*         - Common deception patterns

Fleet Coordination

const investigationFleet = await FleetManager.coordinate({
  strategy: 'evidence-investigation',
  agents: [
    'qe-code-reviewer',        // Code analysis
    'qe-security-auditor',     // Security claim verification
    'qe-performance-validator' // Performance claim verification
  ],
  topology: 'parallel'
});

Related Skills


Remember

"It is a capital mistake to theorize before one has data." Trust only reproducible evidence. Don't trust commit messages, documentation, or "works on my machine."

The Sherlock Standard: Every claim must be verified empirically. What does the evidence actually show?

GitHub 仓库

proffesor-for-testing/agentic-qe
路径: .claude/skills/sherlock-review
agenticqeagenticsfoundationagentsquality-engineering

相关推荐技能

micro-skill-creator

micro-skill-creator能快速创建单一功能的原子化技能,专为构建可组合工作流组件设计。它采用基于证据的提示技术和专家代理模式,确保每个微技能都通过系统性测试验证。开发者可用它生成高度专注、可独立部署的技能模块,提升复杂AI系统的模块化和可靠性。

查看技能

github-code-review

其他

这是一个专为GitHub代码审查设计的Claude Skill,通过AI驱动的多智能体协同工作实现全面的代码审查。它能够自动化PR管理、执行安全和性能分析,并生成智能评论。开发者只需使用GitHub CLI即可快速初始化审查流程,适用于需要高效、智能代码审查的团队。

查看技能

code-review-quality

其他

这是一个专注于代码质量、可测试性和可维护性的上下文驱动代码审查技能。它能在审查代码、提供反馈或建立审查实践时,自动从质量、安全、性能和测试覆盖度等多维度进行分析。该技能通过优先级分类、提问式反馈和提供上下文建议,帮助开发者进行高效、有建设性的代码评审。

查看技能

github-code-review

其他

这个GitHub代码审查Skill通过AI驱动的多智能体协调,为开发者提供全面的自动化代码审查。它能够执行安全与性能分析、智能生成评论并实施质量门控,特别适合需要高效管理PR的团队。使用GitHub CLI即可快速初始化审查群组并集成到现有工作流中。

查看技能