typed-holes-refactor

rand

Updated Today

16 views

Testingaitestingdesign

About

This skill helps developers systematically refactor codebases using the Design by Typed Holes methodology. It treats architectural unknowns as typed holes and resolves them iteratively through test-driven validation and constraint propagation. Use it when refactoring existing code, optimizing architecture, or consolidating technical debt through this structured approach.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/rand/cc-polymath

Git CloneAlternative

git clone https://github.com/rand/cc-polymath.git ~/.claude/skills/typed-holes-refactor

Copy and paste this command in Claude Code to install this skill

Documentation

Typed Holes Refactoring

Systematically refactor codebases using the Design by Typed Holes meta-framework: treat architectural unknowns as typed holes, resolve them iteratively with test-driven validation, and propagate constraints through dependency graphs.

Core Workflow

Phase 0: Hole Discovery & Setup

1. Create safe working branch:

git checkout -b refactor/typed-holes-v1
# CRITICAL: Never work in main, never touch .beads/ in main

2. Analyze current state and identify holes:

python scripts/discover_holes.py
# Creates REFACTOR_IR.md with hole catalog

The Refactor IR documents:

Current State Holes: What's unknown about the current system?
Refactor Holes: What needs resolution to reach the ideal state?
Constraints: What must be preserved/improved/maintained?
Dependencies: Which holes block which others?

3. Write baseline characterization tests:

Create tests/characterization/ to capture exact current behavior:

# tests/characterization/test_current_behavior.py
def test_api_contracts():
    """All public APIs must behave identically post-refactor"""
    for endpoint in discover_public_apis():
        old_result = run_current(endpoint, test_inputs)
        save_baseline(endpoint, old_result)

def test_performance_baselines():
    """Record current performance - don't regress"""
    baselines = measure_all_operations()
    save_json("baselines.json", baselines)

Run tests on main branch - they should all pass. These are your safety net.

Phase 1-N: Iterative Hole Resolution

For each hole (in dependency order):

1. Select next ready hole:

python scripts/next_hole.py
# Shows holes whose dependencies are resolved

2. Write validation tests FIRST (test-driven):

# tests/refactor/test_h{N}_resolution.py
def test_h{N}_resolved():
    """Define what 'resolved correctly' means"""
    # This should FAIL initially
    assert desired_state_achieved()

def test_h{N}_equivalence():
    """Ensure no behavioral regressions"""
    old_behavior = load_baseline()
    new_behavior = run_refactored()
    assert old_behavior == new_behavior

3. Implement resolution:

Refactor code to make tests pass
Keep characterization tests passing
Commit incrementally with clear messages

4. Validate resolution:

python scripts/validate_resolution.py H{N}
# Checks: tests pass, constraints satisfied, main untouched

5. Propagate constraints:

python scripts/propagate.py H{N}
# Updates dependent holes based on resolution

6. Document and commit:

git add .
git commit -m "Resolve H{N}: {description}

- Tests: tests/refactor/test_h{N}_*.py pass
- Constraints: {constraints satisfied}
- Propagates to: {dependent holes}"

Phase Final: Reporting

Generate comprehensive delta report:

python scripts/generate_report.py > REFACTOR_REPORT.md

Report includes:

Hole resolution summary with validation evidence
Metrics delta (LOC, complexity, coverage, performance)
Behavioral analysis (intentional changes documented)
Constraint validation (all satisfied)
Risk assessment and migration guide

Key Principles

1. Test-Driven Everything

Write validation criteria BEFORE implementing
Tests define "correct resolution"
Characterization tests are sacred - never let them fail

2. Hole-Driven Progress

Resolve holes in dependency order
Each resolution propagates constraints
Track everything formally in Refactor IR

3. Continuous Validation

Every commit must validate:

✅ Characterization tests pass (behavior preserved)
✅ Resolution tests pass (hole resolved correctly)
✅ Constraints satisfied
✅ Main branch untouched
✅ .beads/ intact in main

4. Safe by Construction

Work only in refactor branch
Main is read-only reference
Beads are untouchable historical artifacts

5. Formal Completeness

Design complete when:

All holes resolved and validated
All constraints satisfied
All phase gates passed
Metrics improved or maintained

Hole Quality Framework

SMART Criteria for Good Holes

Every hole must be:

Specific: Clear, bounded question with concrete answer
- ✓ Good: "How should error handling work in the API layer?"
- ✗ Bad: "How to improve the code?"
Measurable: Has testable validation criteria
- ✓ Good: "Reduce duplication from 60% to <15%"
- ✗ Bad: "Make code better"
Achievable: Can be resolved with available information
- ✓ Good: "Extract parsing logic to separate module"
- ✗ Bad: "Predict all future requirements"
Relevant: Blocks meaningful progress on refactoring
- ✓ Good: "Define core interface (blocks 5 other holes)"
- ✗ Bad: "Decide variable naming convention"
Typed: Clear type/structure for resolution
- ✓ Good: interface Architecture = { layers: Layer[], rules: Rule[] }
- ✗ Bad: "Some kind of structure?"

Hole Estimation Framework

Size holes using these categories:

Size	Duration	Characteristics	Examples
Nano	1-2 hours	Simple, mechanical changes	Rename files, update imports
Small	4-8 hours	Single module refactor	Extract class, consolidate functions
Medium	1-3 days	Cross-module changes	Define interfaces, reorganize packages
Large	4-7 days	Architecture changes	Layer extraction, pattern implementation
Epic	>7 days	SPLIT THIS HOLE	Too large, break into smaller holes

Estimation Red Flags:

More than 3 dependencies → Likely Medium+
Unclear validation → Add time for discovery
New patterns/tools → Add learning overhead

Hole Splitting Guidelines

Split a hole when:

Estimate exceeds 7 days
More than 5 dependencies
Validation criteria unclear
Multiple distinct concerns mixed

Splitting strategy:

Epic hole: "Refactor entire authentication system"
→ Split into:
  R10_auth_interface: Define new auth interface (Medium)
  R11_token_handling: Implement JWT tokens (Small)
  R12_session_management: Refactor sessions (Medium)
  R13_auth_middleware: Update middleware (Small)
  R14_auth_testing: Comprehensive test suite (Medium)

After splitting:

Update dependencies in REFACTOR_IR.md
Run python scripts/propagate.py to update graph
Re-sync with beads: python scripts/holes_to_beads.py

Common Hole Types

Architecture Holes

"?R1_target_architecture": "What should the ideal structure be?"
"?R2_module_boundaries": "How should modules be organized?"
"?R3_abstraction_layers": "What layers/interfaces are needed?"

Validation: Architecture tests, dependency analysis, layer violation checks

Implementation Holes

"?R4_consolidation_targets": "What code should merge?"
"?R5_extraction_targets": "What code should split out?"
"?R6_elimination_targets": "What code should be removed?"

Validation: Duplication detection, equivalence tests, dead code analysis

Quality Holes

"?R7_test_strategy": "How to validate equivalence?"
"?R8_migration_path": "How to safely transition?"
"?R9_rollback_mechanism": "How to undo if needed?"

Validation: Test coverage metrics, migration dry-runs, rollback tests

See HOLE_TYPES.md for complete catalog.

Constraint Propagation Rules

Rule 1: Interface Resolution → Type Constraints

When: Interface hole resolved with concrete types
Then: Propagate type requirements to all consumers

Example:
  Resolve R6: NodeInterface = BaseNode with async run()
  Propagates to:
    → R4: Parallel execution must handle async
    → R5: Error recovery must handle async exceptions

Rule 2: Implementation → Performance Constraints

When: Implementation resolved with resource usage
Then: Propagate limits to dependent holes

Example:
  Resolve R4: Parallelization with max_concurrent=3
  Propagates to:
    → R8: Rate limit = provider_limit / 3
    → R7: Memory budget = 3 * single_operation_memory

Rule 3: Validation → Test Requirements

When: Validation resolved with test requirements
Then: Propagate data needs upstream

Example:
  Resolve R9: Testing needs 50 examples
  Propagates to:
    → R7: Metrics must support batch evaluation
    → R8: Test data collection strategy needed

See CONSTRAINT_RULES.md for complete propagation rules.

Success Indicators

Weekly Progress

2-4 holes resolved
All tests passing
Constraints satisfied
Measurable improvements

Red Flags (Stop & Reassess)

❌ Characterization tests fail
❌ Hole can't be resolved within constraints
❌ Constraints contradict each other
❌ No progress for 3+ days
❌ Main branch accidentally modified

Validation Gates

Gate	Criteria	Check
Gate 1: Discovery Complete	All holes cataloged, dependencies mapped	`python scripts/check_discovery.py`
Gate 2: Foundation Holes	Core interfaces resolved, tests pass	`python scripts/check_foundation.py`
Gate 3: Implementation	All refactor holes resolved, metrics improved	`python scripts/check_implementation.py`
Gate 4: Production Ready	Migration tested, rollback verified	`python scripts/check_production.py`

Claude-Assisted Workflow

This skill is designed for effective Claude/LLM collaboration. Here's how to divide work:

Phase 0: Discovery

Claude's Role:

Run discover_holes.py to analyze codebase
Suggest holes based on code analysis
Generate initial REFACTOR_IR.md structure
Write characterization tests to capture current behavior
Set up test infrastructure

Your Role:

Confirm holes are well-scoped
Prioritize which holes to tackle first
Review and approve REFACTOR_IR.md
Define critical constraints

Phase 1-N: Hole Resolution

Claude's Role:

Write resolution tests (TDD) BEFORE implementation
Implement hole resolution to make tests pass
Run validation scripts: validate_resolution.py, check_foundation.py
Update REFACTOR_IR.md with resolution details
Propagate constraints: python scripts/propagate.py H{N}
Generate commit messages documenting changes

Your Role:

Make architecture decisions (which pattern, which approach)
Assess risk and determine constraint priorities
Review code changes for correctness
Approve merge to main when complete

Phase Final: Reporting

Claude's Role:

Generate comprehensive REFACTOR_REPORT.md
Document all metrics deltas
List all validation evidence
Create migration guides
Prepare PR description

Your Role:

Final review of report accuracy
Approve for production deployment
Conduct post-refactor retrospective

Effective Prompting Patterns

Starting a session:

"I need to refactor [description]. Use typed-holes-refactor skill.
Start with discovery phase."

Resolving a hole:

"Resolve H3 (target_architecture). Write tests first, then implement.
Use [specific pattern/approach]."

Checking progress:

"Run check_completeness.py and show me the dashboard.
What's ready to work on next?"

Generating visualizations:

"Generate dependency graph showing bottlenecks and critical path.
Use visualize_graph.py with --analyze."

Claude's Limitations

Claude CANNOT:

Make subjective architecture decisions (you must decide)
Determine business-critical constraints (you must specify)
Run tests that require external services (mock or you run them)
Merge to main (you must approve and merge)

Claude CAN:

Analyze code and suggest holes
Write comprehensive test suites
Implement resolutions within your constraints
Generate reports and documentation
Track progress across sessions (via beads + REFACTOR_IR.md)

Multi-Session Continuity

At session start:

"Continue typed-holes refactoring. Import beads state and
show current status from REFACTOR_IR.md."

Claude will:

Read REFACTOR_IR.md to understand current state
Check which holes are resolved
Identify next ready holes
Resume where previous session left off

You should:

Keep REFACTOR_IR.md and .beads/ committed to git
Export beads state at session end: bd export -o .beads/issues.jsonl
Use /context before starting to ensure Claude has full context

Beads Integration

Why beads + typed holes?

Beads tracks issues across sessions (prevents lost work)
Holes track refactoring-specific state (dependencies, constraints)
Together: Complete continuity for long-running refactors

Setup

# Install beads (once)
go install github.com/steveyegge/beads/cmd/bd@latest

# After running discover_holes.py
python scripts/holes_to_beads.py

# Check what's ready
bd ready --json

Workflow Integration

During hole resolution:

# Start work on a hole
bd update bd-5 --status in_progress --json

# Implement resolution
# ... write tests, implement code ...

# Validate resolution
python scripts/validate_resolution.py H3

# Close bead
bd close bd-5 --reason "Resolved H3: target_architecture" --json

# Export state
bd export -o .beads/issues.jsonl
git add .beads/issues.jsonl REFACTOR_IR.md
git commit -m "Resolve H3: Define target architecture"

Syncing holes ↔ beads:

# After updating REFACTOR_IR.md manually
python scripts/holes_to_beads.py  # Sync changes to beads

# After resolving holes
python scripts/holes_to_beads.py  # Update bead statuses

Cross-session continuity:

# Session start
bd import -i .beads/issues.jsonl
bd ready --json  # Shows ready holes
python scripts/check_completeness.py  # Shows overall progress

# Session end
bd export -o .beads/issues.jsonl
git add .beads/issues.jsonl
git commit -m "Session checkpoint: 3 holes resolved"

Bead advantages:

Tracks work across days/weeks
Shows dependency graph: bd deps bd-5
Prevents context loss
Integrates with overall project management

Scripts Reference

All scripts are in scripts/:

discover_holes.py - Analyze codebase and generate REFACTOR_IR.md
next_hole.py - Show next resolvable holes based on dependencies
validate_resolution.py - Check if hole resolution satisfies constraints
propagate.py - Update dependent holes after resolution
generate_report.py - Create comprehensive delta report
check_discovery.py - Validate Phase 0 completeness (Gate 1)
check_foundation.py - Validate Phase 1 completeness (Gate 2)
check_implementation.py - Validate Phase 2 completeness (Gate 3)
check_production.py - Validate Phase 3 readiness (Gate 4)
check_completeness.py - Overall progress dashboard
visualize_graph.py - Generate hole dependency visualization
holes_to_beads.py - Sync holes with beads issues

Run any script with --help for detailed usage.

Meta-Consistency

This skill uses its own principles:

Typed Holes Principle	Application to Refactoring
Typed Holes	Architectural unknowns cataloged with types
Constraint Propagation	Design constraints flow through dependency graph
Iterative Refinement	Hole-by-hole resolution cycles
Test-Driven Validation	Tests define correctness
Formal Completeness	Gates verify design completeness

We use the system to refactor the system.

Advanced Topics

For complex scenarios, see:

HOLE_TYPES.md - Detailed hole taxonomy
CONSTRAINT_RULES.md - Complete propagation rules
VALIDATION_PATTERNS.md - Test patterns for different hole types
EXAMPLES.md - Complete worked examples

Quick Start Example

# 1. Setup
git checkout -b refactor/typed-holes-v1
python scripts/discover_holes.py

# 2. Write baseline tests
# Create tests/characterization/test_*.py

# 3. Resolve first hole
python scripts/next_hole.py  # Shows H1 is ready
# Write tests/refactor/test_h1_*.py (fails initially)
# Refactor code until tests pass
python scripts/validate_resolution.py H1
python scripts/propagate.py H1
git commit -m "Resolve H1: ..."

# 4. Repeat for each hole
# ...

# 5. Generate report
python scripts/generate_report.py > REFACTOR_REPORT.md

Troubleshooting

Characterization tests fail

Symptom: Tests that captured baseline behavior now fail

Resolution:

Revert changes: git diff to see what changed
Investigate: What behavior changed and why?

Decision tree:

Intentional change: Update baselines with documentation

# Update baseline with reason
save_baseline("v2_api", new_behavior,
              reason="Switched to async implementation")

Unintentional regression: Fix the code, tests must pass

Prevention: Run characterization tests before AND after each hole resolution.

Hole can't be resolved

Symptom: Stuck on a hole for >3 days, unclear how to proceed

Resolution:

Check dependencies: Are they actually resolved?

python scripts/visualize_graph.py --analyze
# Look for unresolved dependencies

Review constraints: Are they contradictory?
- Example: C1 "preserve all behavior" + C5 "change API contract" → Contradictory
- Fix: Renegotiate constraints with stakeholders

Split the hole: If hole is too large

# Original: R4_consolidate_all (Epic, 10+ days)
# Split into:
R4a_consolidate_parsers (Medium, 2 days)
R4b_consolidate_validators (Small, 1 day)
R4c_consolidate_handlers (Medium, 2 days)

Check for circular dependencies:
```
python scripts/visualize_graph.py
# Look for cycles: R4 → R5 → R6 → R4
```
- Fix: Break cycle by introducing intermediate hole or redefining dependencies

Escalation: If still stuck after 5 days, consider alternative refactoring approach.

Contradictory Constraints

Symptom: Cannot satisfy all constraints simultaneously

Example:

C1: "Preserve exact current behavior" (backward compatibility)
C5: "Reduce response time by 50%" (performance improvement)
Current behavior includes slow, synchronous operations

Resolution Framework:

Identify the conflict:

C1 requires: Keep synchronous operations
C5 requires: Switch to async operations
→ Contradiction: Can't be both sync and async

Negotiate priorities:

Option C1 C5 Tradeoff
A: Keep sync ✓ ✗ No performance gain
B: Switch to async ✗ ✓ Breaking change
C: Add async, deprecate sync ⚠️ ✓ Migration burden
Choose resolution strategy:
- Relax constraint: Change C1 to "Preserve behavior where possible"
- Add migration period: C implemented over 2 releases
- Split into phases: Phase 1 (C1), Phase 2 (C5)

Option	C1	C5	Tradeoff
A: Keep sync	✓	✗	No performance gain
B: Switch to async	✗	✓	Breaking change
C: Add async, deprecate sync	⚠️	✓	Migration burden

Document decision:

## Constraint Resolution: C1 vs C5

**Decision**: Relax C1 to allow async migration
**Rationale**: Performance critical for user experience
**Migration**: 3-month deprecation period for sync API
**Approved by**: [Stakeholder], [Date]

Circular Dependencies

Symptom: visualize_graph.py shows cycles

Example:

R4 (consolidate parsers) → depends on R6 (define interface)
R6 (define interface) → depends on R4 (needs parser examples)

Resolution strategies:

Introduce intermediate hole:

H0_parser_analysis: Analyze existing parsers (no dependencies)
R6_interface: Define interface using H0 analysis
R4_consolidate: Implement using R6 interface

Redefine dependencies:
- Maybe R4 doesn't actually need R6
- Or R6 only needs partial R4 (split R4)

Accept iterative refinement:

R6_interface_v1: Initial interface (simple)
R4_consolidate: Implement with v1 interface
R6_interface_v2: Refine based on R4 learnings

Prevention: Define architecture holes before implementation holes.

No Progress for 3+ Days

Symptom: Feeling stuck, no commits, uncertain how to proceed

Resolution checklist:

Review REFACTOR_IR.md: Are holes well-defined (SMART criteria)?
- If not: Rewrite holes to be more specific
Check hole size: Is current hole >7 days estimate?
- If yes: Split into smaller holes
Run dashboard: python scripts/check_completeness.py
- Are you working on a blocked hole?
- Switch to a ready hole instead
Visualize dependencies: python scripts/visualize_graph.py --analyze
- Identify bottlenecks
- Look for parallel work opportunities
Review constraints: Are they achievable?
- Renegotiate if necessary
Seek external review:
- Share REFACTOR_IR.md with colleague
- Get feedback on approach
Consider alternative: Maybe this refactor isn't feasible
- Document why
- Propose different approach

Reset protocol: If still stuck, revert to last working state and try different approach.

Estimation Failures

Symptom: Hole taking 3x longer than estimated

Analysis:

Why did estimate fail?
- Underestimated complexity
- Unforeseen dependencies
- Unclear requirements
- Technical issues (tool problems, infrastructure)
Immediate actions:
- Update REFACTOR_IR.md with revised estimate
- If >7 days, split the hole
- Update beads: bd update bd-5 --note "Revised estimate: 5 days (was 2)"
Future improvements:
- Use actual times to calibrate future estimates
- Add buffer for discovery (20% overhead)
- Note uncertainty in IR: "Estimate: 2-4 days (high uncertainty)"

Learning: Track actual vs estimated time in REFACTOR_REPORT.md for future reference.

Begin with Phase 0: Discovery. Always work in a branch. Test first, refactor second.

GitHub Repository

rand/cc-polymath

Path: skills/typed-holes-refactor

aiclaude-codeskills

Related Skills

sglang

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill

typed-holes-refactor

About

Quick Install

Claude Code

Documentation

Typed Holes Refactoring

Core Workflow

Phase 0: Hole Discovery & Setup

Phase 1-N: Iterative Hole Resolution

Phase Final: Reporting

Key Principles

1. Test-Driven Everything

2. Hole-Driven Progress

3. Continuous Validation

4. Safe by Construction

5. Formal Completeness

Hole Quality Framework

SMART Criteria for Good Holes

Hole Estimation Framework

Hole Splitting Guidelines

Common Hole Types

Architecture Holes

Implementation Holes

Quality Holes

Constraint Propagation Rules

Rule 1: Interface Resolution → Type Constraints

Rule 2: Implementation → Performance Constraints

Rule 3: Validation → Test Requirements

Success Indicators

Weekly Progress

Red Flags (Stop & Reassess)

Validation Gates

Claude-Assisted Workflow

Phase 0: Discovery

Phase 1-N: Hole Resolution

Phase Final: Reporting

Effective Prompting Patterns

Claude's Limitations

Multi-Session Continuity

Beads Integration

Setup

Workflow Integration

Scripts Reference

Meta-Consistency

Advanced Topics

Quick Start Example

Troubleshooting

Characterization tests fail

Hole can't be resolved

Contradictory Constraints

Circular Dependencies

No Progress for 3+ Days

Estimation Failures

GitHub Repository

Related Skills

sglang

evaluating-llms-harness

llamaguard

langchain