cro-optimization
关于
This Claude Skill provides end-to-end conversion rate optimization guidance for hypothesis-driven A/B testing. It helps developers audit funnels, generate test hypotheses, design experiments, and analyze results with statistical rigor. Use it when you need to optimize conversion flows, interpret ambiguous test data, or structure a testing program from scratch.
快速安装
Claude Code
推荐npx skills add rampstackco/claude-skills -a claude-code/plugin add https://github.com/rampstackco/claude-skillsgit clone https://github.com/rampstackco/claude-skills.git ~/.claude/skills/cro-optimization在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
CRO Optimization
Run conversion rate optimization as a structured discipline: audit → hypothesize → test → decide. Stack-agnostic. Tool-agnostic.
This skill is for running tests against existing pages and flows. For writing landing page copy from scratch, use landing-page-copy. For setting up the analytics that make CRO possible, use analytics-strategy.
When to use
- Converting traffic at lower rate than expected
- Specific funnel step has high drop-off
- Pages with high traffic that could move the needle if optimized
- A/B testing infrastructure exists (or can be set up)
- Statistical significance and sample size questions
When NOT to use
- Without sufficient traffic to test (under ~5,000 monthly conversions per variant)
- Pre-launch (no users to test on yet)
- Strategy or messaging-level questions that need qualitative research first
- Brand-defining choices (CRO can't optimize a fundamentally wrong brand)
Required inputs
- The page or flow under optimization
- Current conversion rate and traffic volume
- Access to analytics (event tracking, funnel data)
- An A/B testing tool (or willingness to set one up)
- Time and budget for testing (typically 2 to 8 weeks per test)
The framework: 4 phases
1. Audit
Diagnose before treating.
Quantitative audit:
- Funnel data. Where are users dropping off? The biggest drop is the biggest opportunity.
- Segmentation. Does the funnel perform differently by source, device, geography, audience type?
- Performance data. Are slow pages dragging conversions?
- Search Console / on-site search. What are users looking for that they can't find?
Qualitative audit:
- Session replay. Watch 20+ sessions of users on the target flow. Note friction, confusion, hesitation.
- Heatmaps. Where do users click? Where do they scroll? Where do they not?
- User interviews / surveys. Why did users not convert? Survey people who started but abandoned.
- Form analytics. Which fields cause abandonment? Which cause errors?
- Customer support tickets. What conversion-related questions come in?
Heuristic audit:
- Apply CRO heuristics to the flow:
- Is the value proposition clear in 5 seconds?
- Is there a single primary CTA per page?
- Is the form length appropriate to the offer?
- Is the trust/social proof present?
- Are objections handled?
- Is the page accessible? (Accessibility issues hurt conversion silently.)
The audit produces a list of suspected friction points. Each becomes a hypothesis candidate.
2. Hypothesis
A testable statement.
Hypothesis structure:
Because [observation from audit], we believe that [change] will produce [predicted outcome] for [user segment], because [reason].
Example:
Because session replays show users abandoning at the shipping step (audit), we believe that adding visible shipping cost to the product page (change) will increase add-to-cart conversion by 5 percent (outcome) for desktop users (segment), because users are surprised by shipping cost and abandon (reason).
Hypothesis quality criteria:
- Specific change (not "improve the design")
- Measurable outcome (with a target)
- Grounded in evidence (audit, research, prior tests)
- Tied to a known mechanism (why would this work?)
Hypothesis prioritization (ICE or PIE):
- Impact: How much could this move the metric?
- Confidence: How likely is the hypothesis to be right?
- Ease: How easy to test? (Time, complexity, risk)
Score each 1 to 10. Highest combined scores test first.
3. Test design
A test that produces an unambiguous answer.
Sample size and duration:
Use a sample size calculator (most A/B tools have one) before launching. Inputs:
- Baseline conversion rate
- Minimum detectable effect (the smallest lift you'd care about)
- Statistical power (typically 80%)
- Significance level (typically 95%)
This produces required sample size per variant. Run the test until that sample is reached, OR for a minimum duration that captures full business cycle (typically 2 weeks minimum, to cover weekends and weekly patterns).
Common test setup mistakes:
- Stopping the test the moment significance is hit (peeking)
- Running tests for too short to capture a full business cycle
- Running multiple overlapping tests on the same flow
- Testing during atypical periods (Black Friday, holidays, major campaigns)
- Excluding mobile when 50%+ of traffic is mobile (or vice versa)
- Testing on too small a slice of traffic (low statistical power)
- Not segmenting analysis (overall lift can hide negative impact on a segment)
Test parameters to define before launch:
- Primary metric (one)
- Guardrail metrics (do not go down)
- Sample size
- Duration (minimum and maximum)
- Decision criteria (when to ship, when to kill, when to extend)
- Segments to analyze in addition to overall
4. Decide
After the test concludes.
Decision framework:
| Outcome | Decision |
|---|---|
| Variant clearly wins (>95% significance, exceeds minimum effect) | Ship variant. Document. Continue testing. |
| Variant clearly loses | Kill. Capture the lesson. Iterate hypothesis. |
| Inconclusive (neither significant) | Larger test, different angle, or move on. Don't ship "tied" variants. |
| Small lift, lots of variance | Probably not worth shipping. Even if "winner," may not replicate. |
| Wins overall, loses for important segment | Investigate segment. Consider segment-specific solution. |
Anti-patterns:
- "It looks like it's winning, ship it" before reaching significance
- Shipping a variant because the team wants to (HiPPO - highest paid person's opinion)
- Killing tests too early because they look bad
- Re-running tests until they "win" (false positive risk)
- Not capturing the learning when a test loses
Statistical foundations
Significance and confidence
A 95% significance level means: if there were truly no difference between variants, there's only a 5% chance you'd see results this extreme by chance.
That's not the same as "95% chance the variant wins."
Most CRO tools report Bayesian probabilities ("95% chance of being best"). Read the methodology your tool uses.
Sample size
Conversion testing needs more sample than people intuit. Quick reference:
| Baseline rate | Minimum detectable effect | Sample per variant |
|---|---|---|
| 2% | 10% relative lift | ~75,000 |
| 2% | 20% relative lift | ~19,000 |
| 5% | 10% relative lift | ~30,000 |
| 5% | 20% relative lift | ~7,500 |
| 10% | 10% relative lift | ~14,000 |
| 10% | 20% relative lift | ~3,500 |
(Approximate. Use a calculator.)
If your monthly conversions per variant don't reach these numbers, A/B testing won't produce reliable results. Iterate via design and qualitative research instead.
Multiple testing
The more variants and metrics tested simultaneously, the more false positives. Adjust significance thresholds for multiple comparisons (Bonferroni or similar).
Workflow
- Audit. Quantitative + qualitative + heuristic.
- Generate hypotheses. From audit findings. Apply hypothesis structure.
- Prioritize. ICE or PIE. Top 3 to 5 to test next.
- Design the test. Sample size, duration, primary and guardrail metrics, decision criteria.
- Implement. Build variants. QA carefully (broken variants invalidate tests).
- Run. Don't peek. Don't stop early.
- Analyze. Overall and by segment. Note interesting patterns regardless of significance.
- Decide. Ship, kill, or extend.
- Document. Hypothesis, design, results, decision, lesson.
- Compound. Apply lessons to next round of hypotheses.
Failure patterns
- Testing without audit. Random changes, random results.
- Vague hypotheses. "Make it better" is not a hypothesis.
- Peeking and early stopping. Bias toward false positives.
- Underpowered tests. Not enough sample for a real conclusion.
- HiPPO override. Highest paid person's opinion overrides the data.
- Testing during atypical periods. Holidays distort results.
- Single metric obsession. Conversion ups but average order value craters. Net loss.
- No guardrail metrics. Testing for one outcome, missing damage to others.
- Documentation gap. Wins captured, losses forgotten. Same hypothesis re-tested 3 times.
- Treating each test in isolation. Compounding learning across tests is where CRO programs really win.
Output format
Default output: a markdown test plan at cro-test-[hypothesis-slug].md per test. After the test runs, append the results section.
Structure:
# Test: [Hypothesis short name]
## Hypothesis
Because [observation], we believe that [change] will produce [outcome] for [segment], because [reason].
## Audit evidence
[What evidence supports this hypothesis]
## Test design
- Primary metric:
- Guardrail metrics:
- Sample size required:
- Duration: minimum X, maximum Y
- Variant traffic split:
- Segments to analyze:
## Decision criteria
- Ship if: [conditions]
- Kill if: [conditions]
- Extend if: [conditions]
## Results (filled after test)
- Sample reached:
- Duration actual:
- Primary metric: [variant vs control + significance]
- Guardrail metrics: [results]
- Segment analysis: [findings]
## Decision
[Ship / Kill / Extend / Iterate] - [Why]
## Lesson
[What this teaches us, regardless of outcome]
Reference files
references/hypothesis-library.md- Common high-impact hypothesis patterns by funnel stage.
GitHub 仓库
相关推荐技能
analytics-strategy
其他该Skill帮助开发者设计端到端的分析测量框架,包括事件分类、KPI体系、看板架构和归因模型。它适用于规划分析策略、设计仪表盘、建立跟踪方案或审计现有测量体系等场景。其核心特点是工具无关性,能提供从数据收集到决策支持的全套实施方案。
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
