typed-holes-refactor
About
This skill helps developers systematically refactor codebases using the Design by Typed Holes methodology. It treats architectural unknowns as typed holes and resolves them iteratively through test-driven validation and constraint propagation. Use it when refactoring existing code, optimizing architecture, or consolidating technical debt through this structured approach.
Quick Install
Claude Code
Recommended/plugin add https://github.com/rand/cc-polymathgit clone https://github.com/rand/cc-polymath.git ~/.claude/skills/typed-holes-refactorCopy and paste this command in Claude Code to install this skill
Documentation
Typed Holes Refactoring
Systematically refactor codebases using the Design by Typed Holes meta-framework: treat architectural unknowns as typed holes, resolve them iteratively with test-driven validation, and propagate constraints through dependency graphs.
Core Workflow
Phase 0: Hole Discovery & Setup
1. Create safe working branch:
git checkout -b refactor/typed-holes-v1
# CRITICAL: Never work in main, never touch .beads/ in main
2. Analyze current state and identify holes:
python scripts/discover_holes.py
# Creates REFACTOR_IR.md with hole catalog
The Refactor IR documents:
- Current State Holes: What's unknown about the current system?
- Refactor Holes: What needs resolution to reach the ideal state?
- Constraints: What must be preserved/improved/maintained?
- Dependencies: Which holes block which others?
3. Write baseline characterization tests:
Create tests/characterization/ to capture exact current behavior:
# tests/characterization/test_current_behavior.py
def test_api_contracts():
"""All public APIs must behave identically post-refactor"""
for endpoint in discover_public_apis():
old_result = run_current(endpoint, test_inputs)
save_baseline(endpoint, old_result)
def test_performance_baselines():
"""Record current performance - don't regress"""
baselines = measure_all_operations()
save_json("baselines.json", baselines)
Run tests on main branch - they should all pass. These are your safety net.
Phase 1-N: Iterative Hole Resolution
For each hole (in dependency order):
1. Select next ready hole:
python scripts/next_hole.py
# Shows holes whose dependencies are resolved
2. Write validation tests FIRST (test-driven):
# tests/refactor/test_h{N}_resolution.py
def test_h{N}_resolved():
"""Define what 'resolved correctly' means"""
# This should FAIL initially
assert desired_state_achieved()
def test_h{N}_equivalence():
"""Ensure no behavioral regressions"""
old_behavior = load_baseline()
new_behavior = run_refactored()
assert old_behavior == new_behavior
3. Implement resolution:
- Refactor code to make tests pass
- Keep characterization tests passing
- Commit incrementally with clear messages
4. Validate resolution:
python scripts/validate_resolution.py H{N}
# Checks: tests pass, constraints satisfied, main untouched
5. Propagate constraints:
python scripts/propagate.py H{N}
# Updates dependent holes based on resolution
6. Document and commit:
git add .
git commit -m "Resolve H{N}: {description}
- Tests: tests/refactor/test_h{N}_*.py pass
- Constraints: {constraints satisfied}
- Propagates to: {dependent holes}"
Phase Final: Reporting
Generate comprehensive delta report:
python scripts/generate_report.py > REFACTOR_REPORT.md
Report includes:
- Hole resolution summary with validation evidence
- Metrics delta (LOC, complexity, coverage, performance)
- Behavioral analysis (intentional changes documented)
- Constraint validation (all satisfied)
- Risk assessment and migration guide
Key Principles
1. Test-Driven Everything
- Write validation criteria BEFORE implementing
- Tests define "correct resolution"
- Characterization tests are sacred - never let them fail
2. Hole-Driven Progress
- Resolve holes in dependency order
- Each resolution propagates constraints
- Track everything formally in Refactor IR
3. Continuous Validation
Every commit must validate:
- ✅ Characterization tests pass (behavior preserved)
- ✅ Resolution tests pass (hole resolved correctly)
- ✅ Constraints satisfied
- ✅ Main branch untouched
- ✅
.beads/intact in main
4. Safe by Construction
- Work only in refactor branch
- Main is read-only reference
- Beads are untouchable historical artifacts
5. Formal Completeness
Design complete when:
- All holes resolved and validated
- All constraints satisfied
- All phase gates passed
- Metrics improved or maintained
Hole Quality Framework
SMART Criteria for Good Holes
Every hole must be:
-
Specific: Clear, bounded question with concrete answer
- ✓ Good: "How should error handling work in the API layer?"
- ✗ Bad: "How to improve the code?"
-
Measurable: Has testable validation criteria
- ✓ Good: "Reduce duplication from 60% to <15%"
- ✗ Bad: "Make code better"
-
Achievable: Can be resolved with available information
- ✓ Good: "Extract parsing logic to separate module"
- ✗ Bad: "Predict all future requirements"
-
Relevant: Blocks meaningful progress on refactoring
- ✓ Good: "Define core interface (blocks 5 other holes)"
- ✗ Bad: "Decide variable naming convention"
-
Typed: Clear type/structure for resolution
- ✓ Good:
interface Architecture = { layers: Layer[], rules: Rule[] } - ✗ Bad: "Some kind of structure?"
- ✓ Good:
Hole Estimation Framework
Size holes using these categories:
| Size | Duration | Characteristics | Examples |
|---|---|---|---|
| Nano | 1-2 hours | Simple, mechanical changes | Rename files, update imports |
| Small | 4-8 hours | Single module refactor | Extract class, consolidate functions |
| Medium | 1-3 days | Cross-module changes | Define interfaces, reorganize packages |
| Large | 4-7 days | Architecture changes | Layer extraction, pattern implementation |
| Epic | >7 days | SPLIT THIS HOLE | Too large, break into smaller holes |
Estimation Red Flags:
- More than 3 dependencies → Likely Medium+
- Unclear validation → Add time for discovery
- New patterns/tools → Add learning overhead
Hole Splitting Guidelines
Split a hole when:
- Estimate exceeds 7 days
- More than 5 dependencies
- Validation criteria unclear
- Multiple distinct concerns mixed
Splitting strategy:
Epic hole: "Refactor entire authentication system"
→ Split into:
R10_auth_interface: Define new auth interface (Medium)
R11_token_handling: Implement JWT tokens (Small)
R12_session_management: Refactor sessions (Medium)
R13_auth_middleware: Update middleware (Small)
R14_auth_testing: Comprehensive test suite (Medium)
After splitting:
- Update dependencies in REFACTOR_IR.md
- Run
python scripts/propagate.pyto update graph - Re-sync with beads:
python scripts/holes_to_beads.py
Common Hole Types
Architecture Holes
"?R1_target_architecture": "What should the ideal structure be?"
"?R2_module_boundaries": "How should modules be organized?"
"?R3_abstraction_layers": "What layers/interfaces are needed?"
Validation: Architecture tests, dependency analysis, layer violation checks
Implementation Holes
"?R4_consolidation_targets": "What code should merge?"
"?R5_extraction_targets": "What code should split out?"
"?R6_elimination_targets": "What code should be removed?"
Validation: Duplication detection, equivalence tests, dead code analysis
Quality Holes
"?R7_test_strategy": "How to validate equivalence?"
"?R8_migration_path": "How to safely transition?"
"?R9_rollback_mechanism": "How to undo if needed?"
Validation: Test coverage metrics, migration dry-runs, rollback tests
See HOLE_TYPES.md for complete catalog.
Constraint Propagation Rules
Rule 1: Interface Resolution → Type Constraints
When: Interface hole resolved with concrete types
Then: Propagate type requirements to all consumers
Example:
Resolve R6: NodeInterface = BaseNode with async run()
Propagates to:
→ R4: Parallel execution must handle async
→ R5: Error recovery must handle async exceptions
Rule 2: Implementation → Performance Constraints
When: Implementation resolved with resource usage
Then: Propagate limits to dependent holes
Example:
Resolve R4: Parallelization with max_concurrent=3
Propagates to:
→ R8: Rate limit = provider_limit / 3
→ R7: Memory budget = 3 * single_operation_memory
Rule 3: Validation → Test Requirements
When: Validation resolved with test requirements
Then: Propagate data needs upstream
Example:
Resolve R9: Testing needs 50 examples
Propagates to:
→ R7: Metrics must support batch evaluation
→ R8: Test data collection strategy needed
See CONSTRAINT_RULES.md for complete propagation rules.
Success Indicators
Weekly Progress
- 2-4 holes resolved
- All tests passing
- Constraints satisfied
- Measurable improvements
Red Flags (Stop & Reassess)
- ❌ Characterization tests fail
- ❌ Hole can't be resolved within constraints
- ❌ Constraints contradict each other
- ❌ No progress for 3+ days
- ❌ Main branch accidentally modified
Validation Gates
| Gate | Criteria | Check |
|---|---|---|
| Gate 1: Discovery Complete | All holes cataloged, dependencies mapped | python scripts/check_discovery.py |
| Gate 2: Foundation Holes | Core interfaces resolved, tests pass | python scripts/check_foundation.py |
| Gate 3: Implementation | All refactor holes resolved, metrics improved | python scripts/check_implementation.py |
| Gate 4: Production Ready | Migration tested, rollback verified | python scripts/check_production.py |
Claude-Assisted Workflow
This skill is designed for effective Claude/LLM collaboration. Here's how to divide work:
Phase 0: Discovery
Claude's Role:
- Run
discover_holes.pyto analyze codebase - Suggest holes based on code analysis
- Generate initial REFACTOR_IR.md structure
- Write characterization tests to capture current behavior
- Set up test infrastructure
Your Role:
- Confirm holes are well-scoped
- Prioritize which holes to tackle first
- Review and approve REFACTOR_IR.md
- Define critical constraints
Phase 1-N: Hole Resolution
Claude's Role:
- Write resolution tests (TDD) BEFORE implementation
- Implement hole resolution to make tests pass
- Run validation scripts:
validate_resolution.py,check_foundation.py - Update REFACTOR_IR.md with resolution details
- Propagate constraints:
python scripts/propagate.py H{N} - Generate commit messages documenting changes
Your Role:
- Make architecture decisions (which pattern, which approach)
- Assess risk and determine constraint priorities
- Review code changes for correctness
- Approve merge to main when complete
Phase Final: Reporting
Claude's Role:
- Generate comprehensive REFACTOR_REPORT.md
- Document all metrics deltas
- List all validation evidence
- Create migration guides
- Prepare PR description
Your Role:
- Final review of report accuracy
- Approve for production deployment
- Conduct post-refactor retrospective
Effective Prompting Patterns
Starting a session:
"I need to refactor [description]. Use typed-holes-refactor skill.
Start with discovery phase."
Resolving a hole:
"Resolve H3 (target_architecture). Write tests first, then implement.
Use [specific pattern/approach]."
Checking progress:
"Run check_completeness.py and show me the dashboard.
What's ready to work on next?"
Generating visualizations:
"Generate dependency graph showing bottlenecks and critical path.
Use visualize_graph.py with --analyze."
Claude's Limitations
Claude CANNOT:
- Make subjective architecture decisions (you must decide)
- Determine business-critical constraints (you must specify)
- Run tests that require external services (mock or you run them)
- Merge to main (you must approve and merge)
Claude CAN:
- Analyze code and suggest holes
- Write comprehensive test suites
- Implement resolutions within your constraints
- Generate reports and documentation
- Track progress across sessions (via beads + REFACTOR_IR.md)
Multi-Session Continuity
At session start:
"Continue typed-holes refactoring. Import beads state and
show current status from REFACTOR_IR.md."
Claude will:
- Read REFACTOR_IR.md to understand current state
- Check which holes are resolved
- Identify next ready holes
- Resume where previous session left off
You should:
- Keep REFACTOR_IR.md and .beads/ committed to git
- Export beads state at session end:
bd export -o .beads/issues.jsonl - Use /context before starting to ensure Claude has full context
Beads Integration
Why beads + typed holes?
- Beads tracks issues across sessions (prevents lost work)
- Holes track refactoring-specific state (dependencies, constraints)
- Together: Complete continuity for long-running refactors
Setup
# Install beads (once)
go install github.com/steveyegge/beads/cmd/bd@latest
# After running discover_holes.py
python scripts/holes_to_beads.py
# Check what's ready
bd ready --json
Workflow Integration
During hole resolution:
# Start work on a hole
bd update bd-5 --status in_progress --json
# Implement resolution
# ... write tests, implement code ...
# Validate resolution
python scripts/validate_resolution.py H3
# Close bead
bd close bd-5 --reason "Resolved H3: target_architecture" --json
# Export state
bd export -o .beads/issues.jsonl
git add .beads/issues.jsonl REFACTOR_IR.md
git commit -m "Resolve H3: Define target architecture"
Syncing holes ↔ beads:
# After updating REFACTOR_IR.md manually
python scripts/holes_to_beads.py # Sync changes to beads
# After resolving holes
python scripts/holes_to_beads.py # Update bead statuses
Cross-session continuity:
# Session start
bd import -i .beads/issues.jsonl
bd ready --json # Shows ready holes
python scripts/check_completeness.py # Shows overall progress
# Session end
bd export -o .beads/issues.jsonl
git add .beads/issues.jsonl
git commit -m "Session checkpoint: 3 holes resolved"
Bead advantages:
- Tracks work across days/weeks
- Shows dependency graph:
bd deps bd-5 - Prevents context loss
- Integrates with overall project management
Scripts Reference
All scripts are in scripts/:
discover_holes.py- Analyze codebase and generate REFACTOR_IR.mdnext_hole.py- Show next resolvable holes based on dependenciesvalidate_resolution.py- Check if hole resolution satisfies constraintspropagate.py- Update dependent holes after resolutiongenerate_report.py- Create comprehensive delta reportcheck_discovery.py- Validate Phase 0 completeness (Gate 1)check_foundation.py- Validate Phase 1 completeness (Gate 2)check_implementation.py- Validate Phase 2 completeness (Gate 3)check_production.py- Validate Phase 3 readiness (Gate 4)check_completeness.py- Overall progress dashboardvisualize_graph.py- Generate hole dependency visualizationholes_to_beads.py- Sync holes with beads issues
Run any script with --help for detailed usage.
Meta-Consistency
This skill uses its own principles:
| Typed Holes Principle | Application to Refactoring |
|---|---|
| Typed Holes | Architectural unknowns cataloged with types |
| Constraint Propagation | Design constraints flow through dependency graph |
| Iterative Refinement | Hole-by-hole resolution cycles |
| Test-Driven Validation | Tests define correctness |
| Formal Completeness | Gates verify design completeness |
We use the system to refactor the system.
Advanced Topics
For complex scenarios, see:
- HOLE_TYPES.md - Detailed hole taxonomy
- CONSTRAINT_RULES.md - Complete propagation rules
- VALIDATION_PATTERNS.md - Test patterns for different hole types
- EXAMPLES.md - Complete worked examples
Quick Start Example
# 1. Setup
git checkout -b refactor/typed-holes-v1
python scripts/discover_holes.py
# 2. Write baseline tests
# Create tests/characterization/test_*.py
# 3. Resolve first hole
python scripts/next_hole.py # Shows H1 is ready
# Write tests/refactor/test_h1_*.py (fails initially)
# Refactor code until tests pass
python scripts/validate_resolution.py H1
python scripts/propagate.py H1
git commit -m "Resolve H1: ..."
# 4. Repeat for each hole
# ...
# 5. Generate report
python scripts/generate_report.py > REFACTOR_REPORT.md
Troubleshooting
Characterization tests fail
Symptom: Tests that captured baseline behavior now fail
Resolution:
- Revert changes:
git diffto see what changed - Investigate: What behavior changed and why?
- Decision tree:
- Intentional change: Update baselines with documentation
# Update baseline with reason save_baseline("v2_api", new_behavior, reason="Switched to async implementation") - Unintentional regression: Fix the code, tests must pass
- Intentional change: Update baselines with documentation
Prevention: Run characterization tests before AND after each hole resolution.
Hole can't be resolved
Symptom: Stuck on a hole for >3 days, unclear how to proceed
Resolution:
-
Check dependencies: Are they actually resolved?
python scripts/visualize_graph.py --analyze # Look for unresolved dependencies -
Review constraints: Are they contradictory?
- Example: C1 "preserve all behavior" + C5 "change API contract" → Contradictory
- Fix: Renegotiate constraints with stakeholders
-
Split the hole: If hole is too large
# Original: R4_consolidate_all (Epic, 10+ days) # Split into: R4a_consolidate_parsers (Medium, 2 days) R4b_consolidate_validators (Small, 1 day) R4c_consolidate_handlers (Medium, 2 days) -
Check for circular dependencies:
python scripts/visualize_graph.py # Look for cycles: R4 → R5 → R6 → R4- Fix: Break cycle by introducing intermediate hole or redefining dependencies
Escalation: If still stuck after 5 days, consider alternative refactoring approach.
Contradictory Constraints
Symptom: Cannot satisfy all constraints simultaneously
Example:
- C1: "Preserve exact current behavior" (backward compatibility)
- C5: "Reduce response time by 50%" (performance improvement)
- Current behavior includes slow, synchronous operations
Resolution Framework:
-
Identify the conflict:
C1 requires: Keep synchronous operations C5 requires: Switch to async operations → Contradiction: Can't be both sync and async -
Negotiate priorities:
Option C1 C5 Tradeoff A: Keep sync ✓ ✗ No performance gain B: Switch to async ✗ ✓ Breaking change C: Add async, deprecate sync ⚠️ ✓ Migration burden -
Choose resolution strategy:
- Relax constraint: Change C1 to "Preserve behavior where possible"
- Add migration period: C implemented over 2 releases
- Split into phases: Phase 1 (C1), Phase 2 (C5)
-
Document decision:
## Constraint Resolution: C1 vs C5 **Decision**: Relax C1 to allow async migration **Rationale**: Performance critical for user experience **Migration**: 3-month deprecation period for sync API **Approved by**: [Stakeholder], [Date]
Circular Dependencies
Symptom: visualize_graph.py shows cycles
Example:
R4 (consolidate parsers) → depends on R6 (define interface)
R6 (define interface) → depends on R4 (needs parser examples)
Resolution strategies:
-
Introduce intermediate hole:
H0_parser_analysis: Analyze existing parsers (no dependencies) R6_interface: Define interface using H0 analysis R4_consolidate: Implement using R6 interface -
Redefine dependencies:
- Maybe R4 doesn't actually need R6
- Or R6 only needs partial R4 (split R4)
-
Accept iterative refinement:
R6_interface_v1: Initial interface (simple) R4_consolidate: Implement with v1 interface R6_interface_v2: Refine based on R4 learnings
Prevention: Define architecture holes before implementation holes.
No Progress for 3+ Days
Symptom: Feeling stuck, no commits, uncertain how to proceed
Resolution checklist:
-
Review REFACTOR_IR.md: Are holes well-defined (SMART criteria)?
- If not: Rewrite holes to be more specific
-
Check hole size: Is current hole >7 days estimate?
- If yes: Split into smaller holes
-
Run dashboard:
python scripts/check_completeness.py- Are you working on a blocked hole?
- Switch to a ready hole instead
-
Visualize dependencies:
python scripts/visualize_graph.py --analyze- Identify bottlenecks
- Look for parallel work opportunities
-
Review constraints: Are they achievable?
- Renegotiate if necessary
-
Seek external review:
- Share REFACTOR_IR.md with colleague
- Get feedback on approach
-
Consider alternative: Maybe this refactor isn't feasible
- Document why
- Propose different approach
Reset protocol: If still stuck, revert to last working state and try different approach.
Estimation Failures
Symptom: Hole taking 3x longer than estimated
Analysis:
-
Why did estimate fail?
- Underestimated complexity
- Unforeseen dependencies
- Unclear requirements
- Technical issues (tool problems, infrastructure)
-
Immediate actions:
- Update REFACTOR_IR.md with revised estimate
- If >7 days, split the hole
- Update beads:
bd update bd-5 --note "Revised estimate: 5 days (was 2)"
-
Future improvements:
- Use actual times to calibrate future estimates
- Add buffer for discovery (20% overhead)
- Note uncertainty in IR: "Estimate: 2-4 days (high uncertainty)"
Learning: Track actual vs estimated time in REFACTOR_REPORT.md for future reference.
Begin with Phase 0: Discovery. Always work in a branch. Test first, refactor second.
GitHub Repository
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
