when-testing-code-use-testing-framework
About
This skill provides a standardized procedure for running your testing framework to validate code changes with reproducible commands and evidence capture. It triggers when you need to test changes before a merge, reproduce bugs, or establish baselines. Key features include enforcing confidence ceilings, ensuring result reproducibility, and integrating with tools like Bash and Grep for execution.
Quick Install
Claude Code
Recommended/plugin add https://github.com/DNYoussef/context-cascadegit clone https://github.com/DNYoussef/context-cascade.git ~/.claude/skills/when-testing-code-use-testing-frameworkCopy and paste this command in Claude Code to install this skill
Documentation
STANDARD OPERATING PROCEDURE
Purpose
Guide reviewers and developers to execute the testing framework for code validation, ensuring evidence capture, reproducibility, and confidence ceilings.
Trigger Conditions
- Positive: validating changes before merge, reproducing reported bugs, or establishing baselines for new features.
- Negative: style-only polish (use style-audit) or verification of claims without execution (use verification-quality).
Guardrails
- Confidence ceiling: Include
Confidence: X.XX (ceiling: TYPE Y.YY)using ceilings {inference/report 0.70, research 0.85, observation/definition 0.95}. - Reproducibility: Document commands, environment, fixtures, and seeds; attach logs.
- Structure-first: Maintain
readme.md,process.md, and scripts to run/generate tests; keep examples/tests synced with the current framework. - Adversarial validation: Run boundary/negative cases in addition to happy paths.
Execution Phases
- Setup
- Review
readme.mdand scripts (slash-command-test-run.sh,slash-command-test-generate.sh). - Prepare environment per
subagent-testing-framework.mdand ensure dependencies are installed.
- Review
- Test Selection & Generation
- Identify suites relevant to the change; generate missing cases using provided scripts if needed.
- Execution
- Run tests with reproducible commands; capture outputs and failures with file:line references.
- Re-run flaky tests to confirm stability; note nondeterminism.
- Reporting & Confidence
- Summarize pass/fail counts, failing cases, and reproduction steps.
- Recommend fixes or reruns; provide confidence with ceiling.
Output Format
- Environment and commands used.
- Test results (pass/fail, logs, failing file:line).
- Flaky cases and follow-up actions.
- Confidence statement using ceiling syntax.
Validation Checklist
- Environment and dependencies prepared.
- Relevant suites selected and/or generated.
- Tests executed with logs captured; flakiness noted.
- Confidence ceiling provided; English-only output.
Confidence: 0.72 (ceiling: inference 0.70) – SOP rewritten using Prompt Architect confidence discipline and Skill Forge structure-first testing workflow.
GitHub Repository
Related Skills
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
webapp-testing
TestingThis Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.
finishing-a-development-branch
TestingThis skill helps developers complete finished work by verifying tests pass and then presenting structured integration options. It guides the workflow for merging, creating PRs, or cleaning up branches after implementation is done. Use it when your code is ready and tested to systematically finalize the development process.
go-test
MetaThe go-test skill provides expertise in Go's standard testing package and best practices. It helps developers implement table-driven tests, subtests, benchmarks, and coverage strategies while following Go conventions. Use it when writing test files, creating mocks, detecting race conditions, or organizing integration tests in Go projects.
