QuantConnect Validation
About
This skill performs walk-forward validation for QuantConnect strategies to detect overfitting and ensure robustness before deployment. It helps developers evaluate out-of-sample performance by training on 80% of data and testing on the remaining 20%. Use it when making deployment decisions or testing strategy generalization with the `qc_validate.py` tool.
Quick Install
Claude Code
Recommended/plugin add https://github.com/derekcrosslu/CLAUDE_CODE_EXPLOREgit clone https://github.com/derekcrosslu/CLAUDE_CODE_EXPLORE.git ~/.claude/skills/QuantConnect ValidationCopy and paste this command in Claude Code to install this skill
Documentation
QuantConnect Validation Skill (Phase 5)
Purpose: Walk-forward validation for Phase 5 robustness testing before deployment.
Progressive Disclosure: This primer contains essentials only. Full details available via qc_validate.py docs command.
When to Use This Skill
Load when:
- Running
/qc-validatecommand - Testing out-of-sample performance
- Evaluating strategy robustness
- Making deployment decisions
Tool: Use python SCRIPTS/qc_validate.py for walk-forward validation
Walk-Forward Validation Overview
Purpose: Detect overfitting and ensure strategy generalizes to new data.
Approach:
- Training (in-sample): Develop/optimize on 80% of data
- Testing (out-of-sample): Validate on remaining 20%
- Compare: Measure performance degradation
Example (5-year backtest 2019-2023):
- In-sample: 2019-2022 (4 years) - Training period
- Out-of-sample: 2023 (1 year) - Testing period
Key Metrics
1. Performance Degradation
Formula: (IS Sharpe - OOS Sharpe) / IS Sharpe
| Degradation | Quality | Decision |
|---|---|---|
| < 15% | Excellent | Deploy with confidence |
| 15-30% | Acceptable | Deploy but monitor |
| 30-40% | Concerning | Escalate to human |
| > 40% | Severe | Abandon (overfit) |
Key Insight: < 15% degradation indicates robust strategy that generalizes well.
2. Robustness Score
Formula: OOS Sharpe / IS Sharpe
| Score | Quality | Interpretation |
|---|---|---|
| > 0.75 | High | Strategy robust across periods |
| 0.60-0.75 | Moderate | Acceptable but monitor |
| < 0.60 | Low | Strategy unstable |
Key Insight: > 0.75 indicates strategy maintains performance out-of-sample.
Quick Usage
Run Walk-Forward Validation
# From hypothesis directory with iteration_state.json
python SCRIPTS/qc_validate.py run --strategy strategy.py
# Custom split ratio (default 80/20)
python SCRIPTS/qc_validate.py run --strategy strategy.py --split 0.70
What it does:
- Reads project_id from
iteration_state.json - Splits date range (80/20 default)
- Runs in-sample backtest
- Runs out-of-sample backtest
- Calculates degradation and robustness
- Saves results to
PROJECT_LOGS/validation_result.json
Analyze Results
python SCRIPTS/qc_validate.py analyze --results PROJECT_LOGS/validation_result.json
Output:
- Performance comparison table
- Degradation percentage
- Robustness assessment
- Deployment recommendation
Decision Integration
After validation, the decision framework evaluates:
DEPLOY_STRATEGY (Deploy with confidence):
- Degradation < 15% AND
- Robustness > 0.75 AND
- OOS Sharpe > 0.7
PROCEED_WITH_CAUTION (Deploy but monitor):
- Degradation < 30% AND
- Robustness > 0.60 AND
- OOS Sharpe > 0.5
ABANDON_HYPOTHESIS (Too unstable):
- Degradation > 40% OR
- Robustness < 0.5 OR
- OOS Sharpe < 0
ESCALATE_TO_HUMAN (Borderline):
- Results don't clearly fit above criteria
Best Practices
1. Time Splits
- Standard: 80/20 (4 years training, 1 year testing)
- Conservative: 70/30 (more OOS testing)
- Very Conservative: 60/40 (extensive testing)
Minimum OOS period: 6 months (1 year preferred)
2. Never Peek at Out-of-Sample
CRITICAL RULE: Never adjust strategy based on OOS results.
- OOS is for testing only
- Adjusting based on OOS defeats validation purpose
- If you adjust, OOS becomes in-sample
3. Check Trade Count
Both periods need sufficient trades:
- In-sample: Minimum 30 trades (50+ preferred)
- Out-of-sample: Minimum 10 trades (20+ preferred)
Too few trades = unreliable validation.
4. Compare Multiple Metrics
Don't just look at Sharpe:
- Sharpe Ratio degradation
- Max Drawdown increase
- Win Rate change
- Profit Factor degradation
- Trade Count consistency
All metrics should degrade similarly for robust strategy.
Common Issues
Severe Degradation (> 40%)
Cause: Strategy overfit to in-sample period
Example:
- IS Sharpe: 1.5 → OOS Sharpe: 0.6
- Degradation: 60%
Decision: ABANDON_HYPOTHESIS
Fix for next hypothesis: Simplify (fewer parameters), longer training period
Different Market Regimes
Cause: IS was bull market, OOS was bear market
Example:
- 2019-2022 (bull): Sharpe 1.2
- 2023 (bear): Sharpe -0.3
Decision: Not necessarily overfit, but not robust across regimes
Fix: Test across multiple regimes, add regime detection
Low Trade Count in OOS
Cause: Strategy stops trading in OOS period
Example:
- IS: 120 trades → OOS: 3 trades
Decision: ESCALATE_TO_HUMAN (insufficient OOS data)
Integration with /qc-validate
The /qc-validate command workflow:
- Read
iteration_state.jsonfor project_id and parameters - Load this skill for validation approach
- Modify strategy for time splits (80/20)
- Run in-sample and OOS backtests
- Calculate degradation and robustness
- Evaluate using decision framework
- Update
iteration_state.jsonwith results - Git commit with validation summary
Reference Documentation
Need implementation details? All reference documentation accessible via --help:
python SCRIPTS/qc_validate.py --help
That's the only way to access complete reference documentation.
Topics covered in --help:
- Walk-forward validation methodology
- Performance degradation thresholds
- Monte Carlo validation techniques
- PSR/DSR statistical metrics
- Common errors and fixes
- Phase 5 decision criteria
The primer above covers 90% of use cases. Use --help for edge cases and detailed analysis.
Related Skills
- quantconnect - Core strategy development
- quantconnect-backtest - Phase 3 backtesting (qc_backtest.py:**)
- quantconnect-optimization - Phase 4 optimization (qc_optimize.py:**)
- decision-framework - Decision thresholds
- backtesting-analysis - Metric interpretation
Key Principles
- OOS is sacred - Never adjust strategy based on OOS results
- Degradation < 15% is excellent - Strategy generalizes well
- Robustness > 0.75 is target - Maintains performance OOS
- Trade count matters - Need sufficient trades in both periods
- Multiple metrics - All should degrade similarly for robustness
Example Decision
In-Sample (2019-2022):
Sharpe: 0.97, Drawdown: 18%, Trades: 142
Out-of-Sample (2023):
Sharpe: 0.89, Drawdown: 22%, Trades: 38
Degradation: 8.2% (< 15%)
Robustness: 0.92 (> 0.75)
→ DEPLOY_STRATEGY (minimal degradation, high robustness)
Version: 2.0.0 (Progressive Disclosure) Last Updated: November 13, 2025 Lines: ~190 (was 463) Context Reduction: 59%
GitHub Repository
Related Skills
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
webapp-testing
TestingThis Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.
finishing-a-development-branch
TestingThis skill helps developers complete finished work by verifying tests pass and then presenting structured integration options. It guides the workflow for merging, creating PRs, or cleaning up branches after implementation is done. Use it when your code is ready and tested to systematically finalize the development process.
go-test
MetaThe go-test skill provides expertise in Go's standard testing package and best practices. It helps developers implement table-driven tests, subtests, benchmarks, and coverage strategies while following Go conventions. Use it when writing test files, creating mocks, detecting race conditions, or organizing integration tests in Go projects.
