Back to Skills

QuantConnect Validation

derekcrosslu
Updated Today
41 views
0
View on GitHub
Testingtesting

About

This skill performs walk-forward validation for QuantConnect strategies to detect overfitting and ensure robustness before deployment. It helps developers evaluate out-of-sample performance by training on 80% of data and testing on the remaining 20%. Use it when making deployment decisions or testing strategy generalization with the `qc_validate.py` tool.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/derekcrosslu/CLAUDE_CODE_EXPLORE
Git CloneAlternative
git clone https://github.com/derekcrosslu/CLAUDE_CODE_EXPLORE.git ~/.claude/skills/QuantConnect Validation

Copy and paste this command in Claude Code to install this skill

Documentation

QuantConnect Validation Skill (Phase 5)

Purpose: Walk-forward validation for Phase 5 robustness testing before deployment.

Progressive Disclosure: This primer contains essentials only. Full details available via qc_validate.py docs command.


When to Use This Skill

Load when:

  • Running /qc-validate command
  • Testing out-of-sample performance
  • Evaluating strategy robustness
  • Making deployment decisions

Tool: Use python SCRIPTS/qc_validate.py for walk-forward validation


Walk-Forward Validation Overview

Purpose: Detect overfitting and ensure strategy generalizes to new data.

Approach:

  1. Training (in-sample): Develop/optimize on 80% of data
  2. Testing (out-of-sample): Validate on remaining 20%
  3. Compare: Measure performance degradation

Example (5-year backtest 2019-2023):

  • In-sample: 2019-2022 (4 years) - Training period
  • Out-of-sample: 2023 (1 year) - Testing period

Key Metrics

1. Performance Degradation

Formula: (IS Sharpe - OOS Sharpe) / IS Sharpe

DegradationQualityDecision
< 15%ExcellentDeploy with confidence
15-30%AcceptableDeploy but monitor
30-40%ConcerningEscalate to human
> 40%SevereAbandon (overfit)

Key Insight: < 15% degradation indicates robust strategy that generalizes well.


2. Robustness Score

Formula: OOS Sharpe / IS Sharpe

ScoreQualityInterpretation
> 0.75HighStrategy robust across periods
0.60-0.75ModerateAcceptable but monitor
< 0.60LowStrategy unstable

Key Insight: > 0.75 indicates strategy maintains performance out-of-sample.


Quick Usage

Run Walk-Forward Validation

# From hypothesis directory with iteration_state.json
python SCRIPTS/qc_validate.py run --strategy strategy.py

# Custom split ratio (default 80/20)
python SCRIPTS/qc_validate.py run --strategy strategy.py --split 0.70

What it does:

  1. Reads project_id from iteration_state.json
  2. Splits date range (80/20 default)
  3. Runs in-sample backtest
  4. Runs out-of-sample backtest
  5. Calculates degradation and robustness
  6. Saves results to PROJECT_LOGS/validation_result.json

Analyze Results

python SCRIPTS/qc_validate.py analyze --results PROJECT_LOGS/validation_result.json

Output:

  • Performance comparison table
  • Degradation percentage
  • Robustness assessment
  • Deployment recommendation

Decision Integration

After validation, the decision framework evaluates:

DEPLOY_STRATEGY (Deploy with confidence):

  • Degradation < 15% AND
  • Robustness > 0.75 AND
  • OOS Sharpe > 0.7

PROCEED_WITH_CAUTION (Deploy but monitor):

  • Degradation < 30% AND
  • Robustness > 0.60 AND
  • OOS Sharpe > 0.5

ABANDON_HYPOTHESIS (Too unstable):

  • Degradation > 40% OR
  • Robustness < 0.5 OR
  • OOS Sharpe < 0

ESCALATE_TO_HUMAN (Borderline):

  • Results don't clearly fit above criteria

Best Practices

1. Time Splits

  • Standard: 80/20 (4 years training, 1 year testing)
  • Conservative: 70/30 (more OOS testing)
  • Very Conservative: 60/40 (extensive testing)

Minimum OOS period: 6 months (1 year preferred)


2. Never Peek at Out-of-Sample

CRITICAL RULE: Never adjust strategy based on OOS results.

  • OOS is for testing only
  • Adjusting based on OOS defeats validation purpose
  • If you adjust, OOS becomes in-sample

3. Check Trade Count

Both periods need sufficient trades:

  • In-sample: Minimum 30 trades (50+ preferred)
  • Out-of-sample: Minimum 10 trades (20+ preferred)

Too few trades = unreliable validation.


4. Compare Multiple Metrics

Don't just look at Sharpe:

  • Sharpe Ratio degradation
  • Max Drawdown increase
  • Win Rate change
  • Profit Factor degradation
  • Trade Count consistency

All metrics should degrade similarly for robust strategy.


Common Issues

Severe Degradation (> 40%)

Cause: Strategy overfit to in-sample period

Example:

  • IS Sharpe: 1.5 → OOS Sharpe: 0.6
  • Degradation: 60%

Decision: ABANDON_HYPOTHESIS

Fix for next hypothesis: Simplify (fewer parameters), longer training period


Different Market Regimes

Cause: IS was bull market, OOS was bear market

Example:

  • 2019-2022 (bull): Sharpe 1.2
  • 2023 (bear): Sharpe -0.3

Decision: Not necessarily overfit, but not robust across regimes

Fix: Test across multiple regimes, add regime detection


Low Trade Count in OOS

Cause: Strategy stops trading in OOS period

Example:

  • IS: 120 trades → OOS: 3 trades

Decision: ESCALATE_TO_HUMAN (insufficient OOS data)


Integration with /qc-validate

The /qc-validate command workflow:

  1. Read iteration_state.json for project_id and parameters
  2. Load this skill for validation approach
  3. Modify strategy for time splits (80/20)
  4. Run in-sample and OOS backtests
  5. Calculate degradation and robustness
  6. Evaluate using decision framework
  7. Update iteration_state.json with results
  8. Git commit with validation summary

Reference Documentation

Need implementation details? All reference documentation accessible via --help:

python SCRIPTS/qc_validate.py --help

That's the only way to access complete reference documentation.

Topics covered in --help:

  • Walk-forward validation methodology
  • Performance degradation thresholds
  • Monte Carlo validation techniques
  • PSR/DSR statistical metrics
  • Common errors and fixes
  • Phase 5 decision criteria

The primer above covers 90% of use cases. Use --help for edge cases and detailed analysis.


Related Skills

  • quantconnect - Core strategy development
  • quantconnect-backtest - Phase 3 backtesting (qc_backtest.py:**)
  • quantconnect-optimization - Phase 4 optimization (qc_optimize.py:**)
  • decision-framework - Decision thresholds
  • backtesting-analysis - Metric interpretation

Key Principles

  1. OOS is sacred - Never adjust strategy based on OOS results
  2. Degradation < 15% is excellent - Strategy generalizes well
  3. Robustness > 0.75 is target - Maintains performance OOS
  4. Trade count matters - Need sufficient trades in both periods
  5. Multiple metrics - All should degrade similarly for robustness

Example Decision

In-Sample (2019-2022):
  Sharpe: 0.97, Drawdown: 18%, Trades: 142

Out-of-Sample (2023):
  Sharpe: 0.89, Drawdown: 22%, Trades: 38

Degradation: 8.2% (< 15%)
Robustness: 0.92 (> 0.75)

→ DEPLOY_STRATEGY (minimal degradation, high robustness)

Version: 2.0.0 (Progressive Disclosure) Last Updated: November 13, 2025 Lines: ~190 (was 463) Context Reduction: 59%

GitHub Repository

derekcrosslu/CLAUDE_CODE_EXPLORE
Path: .claude/skills/quantconnect-validation

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

webapp-testing

Testing

This Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.

View skill

finishing-a-development-branch

Testing

This skill helps developers complete finished work by verifying tests pass and then presenting structured integration options. It guides the workflow for merging, creating PRs, or cleaning up branches after implementation is done. Use it when your code is ready and tested to systematically finalize the development process.

View skill