Survival Analysis

aj-geddes

Updated Today

21 views

Otherdata

About

This skill enables developers to analyze time-to-event data, handling censored observations where the event hasn't occurred. It calculates survival probabilities and compares groups using Kaplan-Meier curves and Cox proportional hazards models. Use it for risk assessment, predicting lifetimes, or comparing the effectiveness of different treatments in clinical or reliability studies.

Documentation

Survival Analysis

Survival analysis studies time until an event occurs, handling censored data where events haven't happened for some subjects, enabling prediction of lifetimes and risk assessment.

Key Concepts

Survival Time: Time until event
Censoring: Event not observed (subject dropped out)
Hazard: Instantaneous risk at time t
Survival Curve: Probability of surviving past time t
Hazard Ratio: Relative risk between groups

Common Models

Kaplan-Meier: Non-parametric survival curves
Cox Proportional Hazards: Semi-parametric regression
Weibull/Exponential: Parametric models
Log-rank Test: Comparing survival curves
Competing Risks: Multiple event types

Implementation with Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from lifelines import KaplanMeierFitter, CoxPHFitter, WeibullAFTFitter
from lifelines.statistics import logrank_test
import warnings
warnings.filterwarnings('ignore')

# Generate sample survival data
np.random.seed(42)
n_patients = 200

# Time to event (in months)
event_times = np.random.exponential(scale=24, size=n_patients)
# Censoring indicator (1 = event occurred, 0 = censored)
event_observed = np.random.binomial(1, 0.7, n_patients)
# Group assignment (0 = control, 1 = treatment)
group = np.random.binomial(1, 0.5, n_patients)
# Age at baseline
age = np.random.uniform(30, 80, n_patients)
# Risk score
risk_score = np.random.uniform(0, 100, n_patients)

# Adjust event times based on group (simulate treatment effect)
event_times = event_times * (1 + group * 0.3)

df = pd.DataFrame({
    'time': event_times,
    'event': event_observed,
    'group': group,
    'age': age,
    'risk_score': risk_score,
})

print("Survival Data Summary:")
print(df.head(10))
print(f"\nTotal subjects: {len(df)}")
print(f"Events: {df['event'].sum()} ({df['event'].sum()/len(df)*100:.1f}%)")
print(f"Censored: {(1-df['event']).sum()} ({(1-df['event']).sum()/len(df)*100:.1f}%)")

# 1. Kaplan-Meier Estimation
kmf = KaplanMeierFitter()
kmf.fit(df['time'], df['event'], label='Overall')

print("\n1. Kaplan-Meier Survival Estimates:")
print(f"Median survival time: {kmf.median_survival_time_:.1f} months")
print(f"6-month survival: {kmf.predict(6):.1%}")
print(f"12-month survival: {kmf.predict(12):.1%}")
print(f"24-month survival: {kmf.predict(24):.1%}")

# 2. Group Comparison
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Overall survival curve
ax = axes[0, 0]
kmf.plot_survival_function(ax=ax, linewidth=2)
ax.set_xlabel('Time (months)')
ax.set_ylabel('Survival Probability')
ax.set_title('Kaplan-Meier Survival Curve (Overall)')
ax.grid(True, alpha=0.3)

# Survival curves by group
ax = axes[0, 1]
for group_val in [0, 1]:
    mask = df['group'] == group_val
    kmf.fit(df[mask]['time'], df[mask]['event'],
           label=f'{"Control" if group_val == 0 else "Treatment"}')
    kmf.plot_survival_function(ax=ax, linewidth=2)

ax.set_xlabel('Time (months)')
ax.set_ylabel('Survival Probability')
ax.set_title('Kaplan-Meier Curves by Group')
ax.grid(True, alpha=0.3)

# 3. Log-Rank Test
mask_control = df['group'] == 0
mask_treatment = df['group'] == 1

results = logrank_test(
    df[mask_control]['time'],
    df[mask_treatment]['time'],
    df[mask_control]['event'],
    df[mask_treatment]['event']
)

print(f"\n3. Log-Rank Test:")
print(f"Test statistic: {results.test_statistic:.4f}")
print(f"P-value: {results.p_value:.4f}")
print(f"Significant: {'Yes' if results.p_value < 0.05 else 'No'}")

# 4. Risk Groups (by quartiles)
df['risk_quartile'] = pd.qcut(df['risk_score'], q=4, labels=['Low', 'Medium-Low', 'Medium-High', 'High'])

ax = axes[1, 0]
for risk_group in ['Low', 'Medium-Low', 'Medium-High', 'High']:
    mask = df['risk_quartile'] == risk_group
    kmf.fit(df[mask]['time'], df[mask]['event'], label=risk_group)
    kmf.plot_survival_function(ax=ax, linewidth=2)

ax.set_xlabel('Time (months)')
ax.set_ylabel('Survival Probability')
ax.set_title('Kaplan-Meier Curves by Risk Quartile')
ax.legend()
ax.grid(True, alpha=0.3)

# 5. Cumulative Hazard
ax = axes[1, 1]
kmf.fit(df['time'], df['event'])
kmf.plot_cumulative_density(ax=ax, linewidth=2)
ax.set_xlabel('Time (months)')
ax.set_ylabel('Cumulative Event Probability')
ax.set_title('Cumulative Event Probability')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# 6. Cox Proportional Hazards Model
cph = CoxPHFitter()
cph.fit(df[['time', 'event', 'group', 'age', 'risk_score']], duration_col='time', event_col='event')

print(f"\n6. Cox Proportional Hazards Model:")
print(cph.summary)

# Hazard ratios
print(f"\nHazard Ratios:")
for var in ['group', 'age', 'risk_score']:
    hr = np.exp(cph.params_[var])
    print(f"  {var}: {hr:.3f}")

# 7. Model Diagnostics
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Partial effects plot
ax = axes[0, 0]
df_partial = df.copy()
df_partial['partial_hazard'] = cph.predict_partial_hazard(df_partial)

for group_val in [0, 1]:
    mask = df_partial['group'] == group_val
    ax.scatter(df_partial[mask]['risk_score'], df_partial[mask]['partial_hazard'],
              alpha=0.6, label=f'{"Control" if group_val == 0 else "Treatment"}')

ax.set_xlabel('Risk Score')
ax.set_ylabel('Partial Hazard')
ax.set_title('Partial Hazard by Risk Score and Group')
ax.legend()
ax.grid(True, alpha=0.3)

# Concordance index over time
ax = axes[0, 1]
concordance_index = cph.concordance_index_
ax.text(0.5, 0.5, f'Concordance Index: {concordance_index:.3f}',
       ha='center', va='center', fontsize=14,
       bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.7))
ax.axis('off')
ax.set_title('Model Performance')

# Survival curves by predicted risk
ax = axes[1, 0]
df['predicted_hazard'] = cph.predict_partial_hazard(df)
df['hazard_quartile'] = pd.qcut(df['predicted_hazard'], q=4, labels=['Low', 'Medium-Low', 'Medium-High', 'High'])

for hazard_group in ['Low', 'Medium-Low', 'Medium-High', 'High']:
    mask = df['hazard_quartile'] == hazard_group
    kmf.fit(df[mask]['time'], df[mask]['event'], label=hazard_group)
    kmf.plot_survival_function(ax=ax, linewidth=2)

ax.set_xlabel('Time (months)')
ax.set_ylabel('Survival Probability')
ax.set_title('Survival by Predicted Risk Quartile')
ax.grid(True, alpha=0.3)

# Variable importance
ax = axes[1, 1]
coef_df = cph.summary[['coef', 'exp(coef)']].copy()
coef_df = coef_df.sort_values('coef')

colors = ['red' if x < 0 else 'green' for x in coef_df['coef']]
ax.barh(coef_df.index, coef_df['coef'], color=colors, alpha=0.7, edgecolor='black')
ax.set_xlabel('Coefficient')
ax.set_title('Variable Coefficients')
ax.axvline(x=0, color='black', linestyle='-', linewidth=0.8)
ax.grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

# 8. Survival Prediction
new_patient = pd.DataFrame({
    'group': [1],
    'age': [65],
    'risk_score': [75],
})

survival_prob = cph.predict_survival_function(new_patient, times=[6, 12, 24])
print(f"\n8. Survival Prediction for New Patient (age 65, treatment, risk 75):")
print(f"6-month survival: {survival_prob.iloc[0, 0]:.1%}")
print(f"12-month survival: {survival_prob.iloc[1, 0]:.1%}")
print(f"24-month survival: {survival_prob.iloc[2, 0]:.1%}")

# 9. Proportional Hazards Assumption
print(f"\n9. Proportional Hazards Test:")
from lifelines.statistics import proportional_hazard_assumption

ph_test = proportional_hazard_assumption(cph, df[['time', 'event', 'group', 'age', 'risk_score']],
                                         time_transform='rank')
print(ph_test)

# 10. Summary Statistics
print(f"\n" + "="*50)
print("SURVIVAL ANALYSIS SUMMARY")
print("="*50)
print(f"Control median survival: {df[df['group']==0]['time'].median():.1f} months")
print(f"Treatment median survival: {df[df['group']==1]['time'].median():.1f} months")
print(f"Log-rank p-value: {results.p_value:.4f}")
print(f"Concordance index: {concordance_index:.3f}")
print("="*50)

Censoring Types

Right censoring: Event hasn't occurred (most common)
Left censoring: Event occurred before observation
Interval censoring: Event in unknown time interval

Model Comparison

Kaplan-Meier: Describes, doesn't explain
Cox Model: Adjusts for covariates, proportional hazards
Parametric: Assumes distribution
Competing Risks: Multiple event types

Applications

Clinical trials
Equipment reliability
Customer churn
Employee retention
Product lifetime

Deliverables

Kaplan-Meier survival curves
Survival probability estimates
Log-rank test results
Cox model coefficients
Hazard ratios
Risk stratification groups
Survival predictions
Model diagnostics

Quick Install

/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/survival-analysis

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

aj-geddes/useful-ai-prompts

Path: skills/survival-analysis

Related Skills

llamaindex

Meta

LlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.

View skill

csv-data-summarizer

Meta

This skill automatically analyzes CSV files to generate comprehensive statistical summaries and visualizations using Python's pandas and matplotlib/seaborn. It should be triggered whenever a user uploads or references CSV data without prompting for analysis preferences. The tool provides immediate insights into data structure, quality, and patterns through automated analysis and visualization.

View skill

hybrid-cloud-networking

Meta

This skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.

View skill

Excel Analysis

Meta

This skill enables developers to analyze Excel files and perform data operations using pandas. It can read spreadsheets, create pivot tables, generate charts, and conduct data analysis on .xlsx files and tabular data. Use it when working with Excel files, spreadsheets, or any structured tabular data within Claude Code.

View skill