systematic-debugging

bobmatnyc

Updated Yesterday

29 views

Otherdebuggingproblem-solvingroot-causesystematic

About

This skill provides a structured four-phase debugging framework to replace random code changes with systematic problem diagnosis. It helps developers methodically investigate bugs, errors, and unexpected behavior by forming specific hypotheses and testing single changes. Use it when under time pressure or when quick fixes seem obvious to ensure reliable problem resolution.

Documentation

Systematic Debugging

Overview

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

This skill enforces a four-phase systematic approach that ensures root cause investigation before any fix attempt. Violating the letter of this process is violating the spirit of debugging.

When to Use This Skill

Activate when:

User reports a bug or error
Test failures occur
Code behaves unexpectedly
Performance problems arise
Build or integration failures
User says "it's not working"

Use this ESPECIALLY when:

Under time pressure (emergencies make guessing tempting)
"Just one quick fix" seems obvious
You've already tried multiple fixes
Previous fix didn't work

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't completed Phase 1, you cannot propose fixes.

Core Principles

Reproduce First: Ensure you can reliably reproduce the issue
One Change at a Time: Change only one thing between tests
Hypothesis-Driven: Form hypotheses before making changes
Verify Fixes: Confirm the fix works and doesn't break anything else

Quick Start

Read Error Messages: Read completely, including stack traces
Reproduce Consistently: Create reliable reproduction steps
Gather Evidence: Add diagnostic instrumentation in multi-component systems
Form Hypothesis: State clearly "I think X because Y"
Test Minimally: Make smallest possible change
Verify Fix: Confirm resolution and no regressions

The Four Phases

Phase 1: Root Cause Investigation

BEFORE attempting ANY fix:

Read error messages carefully (they often contain the solution)
Reproduce consistently
Check recent changes
Gather evidence in multi-component systems
Trace data flow back to source

Phase 2: Pattern Analysis

Find working examples, compare against references, identify differences, understand dependencies.

Phase 3: Hypothesis and Testing

Form single hypothesis, test minimally (one variable at a time), verify before continuing.

Phase 4: Implementation

Create failing test case, implement single fix addressing root cause, verify fix works.

If 3+ fixes fail: STOP and question the architecture - this indicates architectural problems, not failed hypotheses.

Navigation

For detailed information:

Workflow: Complete four-phase debugging workflow with decision trees and detailed steps
Examples: Real-world debugging scenarios with step-by-step walkthroughs
Troubleshooting: Common debugging challenges and how to overcome them
Anti-patterns: Common mistakes, rationalizations, and red flags

Key Reminders

NEVER make random changes hoping they'll work
ALWAYS reproduce the issue before attempting fixes
Form hypothesis BEFORE making changes
Change ONE thing at a time
Verify fix actually resolves the issue
Check for regressions after fixing
If 3+ fixes fail, question the architecture

Red Flags - STOP and Follow Process

If you catch yourself thinking:

"Quick fix for now, investigate later"
"Just try changing X and see if it works"
"It's probably X, let me fix that"
"I don't fully understand but this might work"
"One more fix attempt" (when already tried 2+)
Each fix reveals new problem in different place

ALL of these mean: STOP. Return to Phase 1.

Integration with Other Skills

root-cause-tracing: How to trace back through call stack
defense-in-depth: Add validation after finding root cause
condition-based-waiting: Replace timeouts identified in Phase 2
verification-before-completion: Verify fix worked before claiming success
test-driven-development: Create failing test case in Phase 4

Real-World Impact

From debugging sessions:

Systematic approach: 15-30 minutes to fix
Random fixes approach: 2-3 hours of thrashing
First-time fix rate: 95% vs 40%
New bugs introduced: Near zero vs common

Quick Install

/plugin add https://github.com/bobmatnyc/claude-mpm/tree/main/systematic-debugging

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

bobmatnyc/claude-mpm

Path: src/claude_mpm/skills/bundled/debugging/systematic-debugging

Related Skills

smart-bug-fix

Testing

This skill provides an intelligent bug-fixing workflow that systematically identifies root causes using deep analysis and multi-model reasoning. It then generates fixes through Codex auto-fix and validates them with comprehensive testing and regression analysis. Use this skill for methodical debugging that combines automated fixes with thorough validation.

View skill

sherlock-review

Other

Sherlock-review is an evidence-based code review skill that uses deductive reasoning to systematically verify implementation claims, investigate bugs, and perform root cause analysis. It guides developers through a process of observation, deduction, and elimination to determine what actually happened versus what was claimed. This skill is ideal for validating fixes, conducting security audits, and performing performance validation.

View skill

when-debugging-ml-training-use-ml-training-debugger

Other

This skill helps developers diagnose and fix common machine learning training issues like loss divergence, overfitting, and slow convergence. It provides systematic debugging to identify root causes and delivers fixed code with optimization recommendations. Use it when facing training problems like NaN losses, poor validation performance, or when training fails to converge properly.

View skill

Root Cause Tracing

Other

This skill systematically traces bugs backward through the call stack to identify their original triggers rather than just fixing symptoms. It's designed for use when errors occur deep in execution with unclear data origins or long call chains. The approach involves observing symptoms, finding immediate causes, and repeatedly asking "what called this" until reaching the source.

View skill