Back to Skills

ab-test-stats

guia-matthieu
Updated 2 days ago
3 views
111
20
111
View on GitHub
Testingtestingdesigndata

About

This skill calculates statistical significance for A/B tests, helping developers determine if results are meaningful. It assists with sample size planning, test duration estimation, and power analysis for conversion experiments. Use it to make data-driven decisions by analyzing test outcomes and ensuring reliable experiment design.

Quick Install

Claude Code

Recommended
Primary
npx skills add guia-matthieu/clawfu-skills -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/guia-matthieu/clawfu-skills
Git CloneAlternative
git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/ab-test-stats

Copy and paste this command in Claude Code to install this skill

Documentation

A/B Test Statistics Calculator

Calculate statistical significance for A/B tests - know when your results are real, not random chance.

When to Use This Skill

  • Test analysis - Determine if results are statistically significant
  • Sample planning - Calculate required sample size before testing
  • Duration estimation - Know how long to run experiments
  • Power analysis - Ensure tests can detect meaningful differences

What Claude Does vs What You Decide

Claude DoesYou Decide
Structures analysis frameworksMetric definitions
Identifies patterns in dataBusiness interpretation
Creates visualization templatesDashboard design
Suggests optimization areasAction priorities
Calculates statistical measuresDecision thresholds

Dependencies

pip install scipy numpy click

Commands

Check Significance

python scripts/main.py significance --control 1000,50 --variant 1000,65
python scripts/main.py significance --control 5000,250 --variant 5000,300 --confidence 0.99

Calculate Sample Size

python scripts/main.py sample-size --baseline 0.05 --mde 0.02
python scripts/main.py sample-size --baseline 0.10 --mde 0.01 --power 0.90

Estimate Duration

python scripts/main.py duration --traffic 1000 --baseline 0.05 --mde 0.02

Examples

Example 1: Analyze Test Results

# Control: 1000 visitors, 50 conversions (5%)
# Variant: 1000 visitors, 65 conversions (6.5%)
python scripts/main.py significance --control 1000,50 --variant 1000,65

# Output:
# A/B Test Results
# ─────────────────────────
# Control:  5.00% (50/1000)
# Variant:  6.50% (65/1000)
# Lift:     +30.0%
#
# Statistical Analysis
# ─────────────────────────
# p-value:      0.089
# Confidence:   91.1%
# Result:       NOT SIGNIFICANT (need 95%)
#
# Recommendation: Continue test for more data

Example 2: Plan Sample Size

# Baseline 5% conversion, want to detect 20% relative lift (1% absolute)
python scripts/main.py sample-size --baseline 0.05 --mde 0.01

# Output:
# Sample Size Calculator
# ──────────────────────────────
# Baseline conversion: 5.0%
# Minimum detectable effect: 1.0% (20% relative)
# Target conversion: 6.0%
#
# Required per variant: 3,842 visitors
# Total required: 7,684 visitors
#
# At 1000 daily visitors: ~8 days

Key Concepts

TermDefinition
p-valueProbability result is due to chance
Confidence1 - p-value (usually want 95%+)
PowerProbability of detecting real effect (usually 80%)
MDEMinimum Detectable Effect - smallest lift worth detecting
LiftRelative improvement (variant - control) / control

When Results Are Significant

p-valueConfidenceVerdict
< 0.01> 99%Highly Significant ✓
< 0.05> 95%Significant ✓
< 0.10> 90%Marginally Significant
≥ 0.10< 90%Not Significant ✗

Skill Boundaries

What This Skill Does Well

  • Structuring data analysis
  • Identifying patterns and trends
  • Creating visualization frameworks
  • Calculating statistical measures

What This Skill Cannot Do

  • Access your actual data
  • Replace statistical expertise
  • Make business decisions
  • Guarantee prediction accuracy

Related Skills

Skill Metadata

  • Mode: centaur
category: analytics
subcategory: statistics
dependencies: [scipy, numpy]
difficulty: intermediate
time_saved: 3+ hours/week

GitHub Repository

guia-matthieu/clawfu-skills
Path: skills/analytics/ab-test-stats
0
ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

Related Skills

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

cloudflare-cron-triggers

Testing

This skill provides comprehensive knowledge for implementing Cloudflare Cron Triggers to schedule Workers using cron expressions. It covers setting up periodic tasks, maintenance jobs, and automated workflows while handling common issues like invalid cron expressions and timezone problems. Developers can use it for configuring scheduled handlers, testing cron triggers, and integrating with Workflows and Green Compute.

View skill

webapp-testing

Testing

This Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.

View skill

finishing-a-development-branch

Testing

This skill helps developers complete finished work by verifying tests pass and then presenting structured integration options. It guides the workflow for merging, creating PRs, or cleaning up branches after implementation is done. Use it when your code is ready and tested to systematically finalize the development process.

View skill