Causal Inference
About
This Claude Skill enables causal inference analysis to determine cause-and-effect relationships beyond simple correlations. It provides methods like propensity scoring, instrumental variables, and causal graphs for estimating treatment effects and policy impacts. Developers can use it for robust causal analysis when evaluating interventions or programs with observational data.
Documentation
Causal Inference
Causal inference determines cause-and-effect relationships and estimates treatment effects, going beyond correlation to understand what causes what.
Key Concepts
- Treatment: Intervention or exposure
- Outcome: Result or consequence
- Confounding: Variables affecting both treatment and outcome
- Causal Graph: Visual representation of relationships
- Treatment Effect: Impact of intervention
- Selection Bias: Non-random treatment assignment
Causal Methods
- Randomized Controlled Trials (RCT): Gold standard
- Propensity Score Matching: Balance treatment/control
- Difference-in-Differences: Before/after comparison
- Instrumental Variables: Handle endogeneity
- Causal Forests: Heterogeneous treatment effects
Implementation with Python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.preprocessing import StandardScaler
from scipy import stats
# Generate observational data with confounding
np.random.seed(42)
n = 1000
# Confounder: Age (affects both treatment and outcome)
age = np.random.uniform(25, 75, n)
# Treatment: Training program (more likely for younger people)
treatment_prob = 0.3 + 0.3 * (75 - age) / 50 # Inverse relationship with age
treatment = (np.random.uniform(0, 1, n) < treatment_prob).astype(int)
# Outcome: Salary (affected by both treatment and age)
# True causal effect of treatment: +$5000
salary = 40000 + 500 * age + 5000 * treatment + np.random.normal(0, 10000, n)
df = pd.DataFrame({
'age': age,
'treatment': treatment,
'salary': salary,
})
print("Observational Data Summary:")
print(df.describe())
print(f"\nTreatment Rate: {df['treatment'].mean():.1%}")
print(f"Average Salary (Control): ${df[df['treatment']==0]['salary'].mean():.0f}")
print(f"Average Salary (Treatment): ${df[df['treatment']==1]['salary'].mean():.0f}")
# 1. Naive Comparison (BIASED - ignores confounding)
naive_effect = df[df['treatment']==1]['salary'].mean() - df[df['treatment']==0]['salary'].mean()
print(f"\n1. Naive Comparison: ${naive_effect:.0f} (BIASED)")
# 2. Regression Adjustment (Covariate Adjustment)
X = df[['treatment', 'age']]
y = df['salary']
model = LinearRegression()
model.fit(X, y)
regression_effect = model.coef_[0]
print(f"\n2. Regression Adjustment: ${regression_effect:.0f}")
# 3. Propensity Score Matching
# Estimate probability of treatment given covariates
ps_model = LogisticRegression()
ps_model.fit(df[['age']], df['treatment'])
df['propensity_score'] = ps_model.predict_proba(df[['age']])[:, 1]
print(f"\n3. Propensity Score Matching:")
print(f"PS range: [{df['propensity_score'].min():.3f}, {df['propensity_score'].max():.3f}]")
# Matching: find control for each treated unit
matched_pairs = []
treated_units = df[df['treatment'] == 1].index
for treated_idx in treated_units:
treated_ps = df.loc[treated_idx, 'propensity_score']
treated_age = df.loc[treated_idx, 'age']
# Find closest control unit
control_units = df[(df['treatment'] == 0) &
(df['propensity_score'] >= treated_ps - 0.1) &
(df['propensity_score'] <= treated_ps + 0.1)].index
if len(control_units) > 0:
closest_control = min(control_units,
key=lambda x: abs(df.loc[x, 'propensity_score'] - treated_ps))
matched_pairs.append({
'treated_idx': treated_idx,
'control_idx': closest_control,
'treated_salary': df.loc[treated_idx, 'salary'],
'control_salary': df.loc[closest_control, 'salary'],
})
matched_df = pd.DataFrame(matched_pairs)
psm_effect = (matched_df['treated_salary'] - matched_df['control_salary']).mean()
print(f"PSM Effect: ${psm_effect:.0f}")
print(f"Matched pairs: {len(matched_df)}")
# 4. Stratification by Propensity Score
df['ps_stratum'] = pd.qcut(df['propensity_score'], q=5, labels=False, duplicates='drop')
stratified_effects = []
for stratum in df['ps_stratum'].unique():
stratum_data = df[df['ps_stratum'] == stratum]
if (stratum_data['treatment'] == 0).sum() > 0 and (stratum_data['treatment'] == 1).sum() > 0:
treated_mean = stratum_data[stratum_data['treatment'] == 1]['salary'].mean()
control_mean = stratum_data[stratum_data['treatment'] == 0]['salary'].mean()
effect = treated_mean - control_mean
stratified_effects.append(effect)
stratified_effect = np.mean(stratified_effects)
print(f"\n4. Stratification by PS: ${stratified_effect:.0f}")
# 5. Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Treatment distribution by age
ax = axes[0, 0]
treated = df[df['treatment'] == 1]
control = df[df['treatment'] == 0]
ax.hist(control['age'], bins=20, alpha=0.6, label='Control', color='blue')
ax.hist(treated['age'], bins=20, alpha=0.6, label='Treated', color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Frequency')
ax.set_title('Age Distribution by Treatment')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
# Salary vs Age (colored by treatment)
ax = axes[0, 1]
ax.scatter(control['age'], control['salary'], alpha=0.5, label='Control', s=30)
ax.scatter(treated['age'], treated['salary'], alpha=0.5, label='Treated', s=30, color='red')
ax.set_xlabel('Age')
ax.set_ylabel('Salary')
ax.set_title('Salary vs Age by Treatment')
ax.legend()
ax.grid(True, alpha=0.3)
# Propensity Score Distribution
ax = axes[1, 0]
ax.hist(df[df['treatment'] == 0]['propensity_score'], bins=20, alpha=0.6, label='Control', color='blue')
ax.hist(df[df['treatment'] == 1]['propensity_score'], bins=20, alpha=0.6, label='Treated', color='red')
ax.set_xlabel('Propensity Score')
ax.set_ylabel('Frequency')
ax.set_title('Propensity Score Distribution')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
# Treatment Effect Comparison
ax = axes[1, 1]
methods = ['Naive', 'Regression', 'PSM', 'Stratified']
effects = [naive_effect, regression_effect, psm_effect, stratified_effect]
true_effect = 5000
ax.bar(methods, effects, color=['red', 'orange', 'yellow', 'lightgreen'], alpha=0.7, edgecolor='black')
ax.axhline(y=true_effect, color='green', linestyle='--', linewidth=2, label=f'True Effect (${true_effect:.0f})')
ax.set_ylabel('Treatment Effect ($)')
ax.set_title('Treatment Effect Estimates by Method')
ax.legend()
ax.grid(True, alpha=0.3, axis='y')
for i, effect in enumerate(effects):
ax.text(i, effect + 200, f'${effect:.0f}', ha='center', va='bottom')
plt.tight_layout()
plt.show()
# 6. Doubly Robust Estimation
from sklearn.ensemble import RandomForestRegressor
# Propensity score model
ps_model_dr = LogisticRegression().fit(df[['age']], df['treatment'])
ps_scores = ps_model_dr.predict_proba(df[['age']])[:, 1]
# Outcome model
outcome_model = RandomForestRegressor(n_estimators=50, random_state=42)
outcome_model.fit(df[['treatment', 'age']], df['salary'])
# Doubly robust estimator
treated_mask = df['treatment'] == 1
control_mask = df['treatment'] == 0
# Adjust for propensity score
treated_adjusted = (treated_mask.astype(int) * df['salary']) / (ps_scores + 0.01)
control_adjusted = (control_mask.astype(int) * df['salary']) / (1 - ps_scores + 0.01)
# Outcome predictions
pred_treated = outcome_model.predict(df[['treatment', 'age']].replace({'treatment': 0, 1: 1}))
pred_control = outcome_model.predict(df[['treatment', 'age']].replace({'treatment': 1, 0: 0}))
dr_effect = treated_adjusted.sum() / treated_mask.sum() - control_adjusted.sum() / control_mask.sum()
print(f"\n6. Doubly Robust Estimation: ${dr_effect:.0f}")
# 7. Heterogeneous Treatment Effects
print(f"\n7. Heterogeneous Treatment Effects (by Age Quartile):")
for age_q in pd.qcut(df['age'], q=4, duplicates='drop').unique():
mask = (df['age'] >= age_q.left) & (df['age'] < age_q.right)
stratum_data = df[mask]
if (stratum_data['treatment'] == 0).sum() > 0 and (stratum_data['treatment'] == 1).sum() > 0:
treated_mean = stratum_data[stratum_data['treatment'] == 1]['salary'].mean()
control_mean = stratum_data[stratum_data['treatment'] == 0]['salary'].mean()
effect = treated_mean - control_mean
print(f" Age {age_q.left:.0f}-{age_q.right:.0f}: ${effect:.0f}")
# 8. Sensitivity Analysis
print(f"\n8. Sensitivity Analysis (Hidden Confounder Impact):")
# Vary hidden confounder correlation with outcome
for hidden_effect in [1000, 2000, 5000, 10000]:
adjusted_effect = regression_effect - hidden_effect * 0.1
print(f" If hidden confounder worth ${hidden_effect}: Effect = ${adjusted_effect:.0f}")
# 9. Summary Table
print(f"\n" + "="*60)
print("CAUSAL INFERENCE SUMMARY")
print("="*60)
print(f"True Treatment Effect: ${true_effect:,.0f}")
print(f"\nEstimates:")
print(f" Naive (BIASED): ${naive_effect:,.0f}")
print(f" Regression Adjustment: ${regression_effect:,.0f}")
print(f" Propensity Score Matching: ${psm_effect:,.0f}")
print(f" Stratification: ${stratified_effect:,.0f}")
print(f" Doubly Robust: ${dr_effect:,.0f}")
print("="*60)
# 10. Causal Graph (Text representation)
print(f"\n10. Causal Graph (DAG):")
print(f"""
Age → Treatment ← (Selection Bias)
↓ ↓
└─→ Salary
Interpretation:
- Age is a confounder
- Treatment causally affects Salary
- Age directly affects Salary
- Age affects probability of Treatment
""")
Causal Assumptions
- Unconfoundedness: No unmeasured confounders
- Overlap: Common support on propensity scores
- SUTVA: No interference between units
- Consistency: Single version of treatment
Treatment Effect Types
- ATE: Average Treatment Effect (overall)
- ATT: Average Treatment on Treated
- CATE: Conditional Average Treatment Effect
- HTE: Heterogeneous Treatment Effects
Method Strengths
- RCT: Gold standard, controls all confounders
- Matching: Balances groups, preserves overlap
- Regression: Adjusts for covariates
- Instrumental Variables: Handles endogeneity
- Causal Forests: Learns heterogeneous effects
Deliverables
- Causal graph visualization
- Treatment effect estimates
- Sensitivity analysis
- Heterogeneous treatment effects
- Covariate balance assessment
- Propensity score diagnostics
- Final causal inference report
Quick Install
/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/causal-inferenceCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
subagent-driven-development
DevelopmentThis skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.
algorithmic-art
MetaThis Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.
executing-plans
DesignUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
cost-optimization
OtherThis Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.
