bio-workflows-crispr-screen-pipeline
About
This skill provides an end-to-end pipeline for analyzing pooled CRISPR screens, processing data from FASTQ files through to hit gene identification. It orchestrates guide counting, quality control, and statistical analysis using MAGeCK, and supports multiple hit-calling methods. Use it when you need a complete, automated workflow for CRISPR screen analysis from raw sequencing data to results.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/bio-workflows-crispr-screen-pipelineCopy and paste this command in Claude Code to install this skill
Documentation
CRISPR Screen Pipeline
Pipeline Overview
FASTQ Files ──> Guide Counting ──> Count Matrix
│
▼
┌─────────────────────────────────────────────┐
│ crispr-screen-pipeline │
├─────────────────────────────────────────────┤
│ 1. Guide Counting (MAGeCK count) │
│ 2. QC: Library coverage, gini index │
│ 3. Gene-level Analysis (MAGeCK RRA/MLE) │
│ 4. Hit Calling (FDR, effect size) │
│ 5. Visualization & Reporting │
└─────────────────────────────────────────────┘
│
▼
Hit Genes + Volcano/Rank Plots
Complete Workflow
Step 1: Guide Counting
# From FASTQ files
mageck count \
-l library.csv \
-n experiment \
--sample-label Day0,Day14_Rep1,Day14_Rep2,Day14_Rep3 \
--fastq Day0.fastq.gz Day14_Rep1.fastq.gz Day14_Rep2.fastq.gz Day14_Rep3.fastq.gz \
--trim-5 0 \
--pdf-report
Step 2: Quality Control
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
counts = pd.read_csv('experiment.count.txt', sep='\t', index_col=0)
counts_numeric = counts.iloc[:, 1:]
qc_stats = {}
for col in counts_numeric.columns:
total = counts_numeric[col].sum()
zeros = (counts_numeric[col] == 0).sum()
gini = calculate_gini(counts_numeric[col].values)
qc_stats[col] = {'total_reads': total, 'zero_count_guides': zeros, 'gini': gini}
qc_df = pd.DataFrame(qc_stats).T
print('QC Summary:')
print(qc_df)
# Gini index function
def calculate_gini(x):
x = np.sort(x[x > 0])
n = len(x)
cumsum = np.cumsum(x)
return (2 * np.sum((np.arange(1, n+1) * x)) - (n + 1) * cumsum[-1]) / (n * cumsum[-1])
# QC thresholds
assert qc_df['zero_count_guides'].max() < len(counts) * 0.2, 'Too many zero-count guides'
assert qc_df['gini'].max() < 0.4, 'Gini index too high (uneven distribution)'
print('QC passed!')
Step 3: MAGeCK RRA Analysis (Negative Selection)
# For dropout/negative selection screens
mageck test \
-k experiment.count.txt \
-t Day14_Rep1,Day14_Rep2,Day14_Rep3 \
-c Day0 \
-n negative_screen \
--pdf-report \
--gene-lfc-method alphamedian
Step 4: MAGeCK MLE (Complex Designs)
# For screens with multiple conditions
# Design matrix: design.txt
# samplename,baseline,treatment
# Day0,1,0
# Day14_Ctrl,1,0
# Day14_Drug,1,1
mageck mle \
-k experiment.count.txt \
-d design.txt \
-n mle_analysis \
--threads 8
Step 5: Hit Calling
import pandas as pd
# Load MAGeCK results
gene_summary = pd.read_csv('negative_screen.gene_summary.txt', sep='\t')
# Define hits
gene_summary['neg_hit'] = (gene_summary['neg|fdr'] < 0.05) & (gene_summary['neg|lfc'] < -0.5)
gene_summary['pos_hit'] = (gene_summary['pos|fdr'] < 0.05) & (gene_summary['pos|lfc'] > 0.5)
neg_hits = gene_summary[gene_summary['neg_hit']].sort_values('neg|rank')
pos_hits = gene_summary[gene_summary['pos_hit']].sort_values('pos|rank')
print(f'Negative selection hits (dropout): {len(neg_hits)}')
print(f'Positive selection hits (enriched): {len(pos_hits)}')
# Save hit lists
neg_hits.to_csv('negative_hits.csv', index=False)
pos_hits.to_csv('positive_hits.csv', index=False)
Step 6: Visualization
import matplotlib.pyplot as plt
import numpy as np
# Volcano plot
fig, ax = plt.subplots(figsize=(10, 8))
x = gene_summary['neg|lfc']
y = -np.log10(gene_summary['neg|fdr'] + 1e-10)
colors = ['red' if h else 'blue' if p else 'gray'
for h, p in zip(gene_summary['neg_hit'], gene_summary['pos_hit'])]
ax.scatter(x, y, c=colors, alpha=0.5, s=20)
ax.axhline(-np.log10(0.05), linestyle='--', color='black', alpha=0.5)
ax.axvline(-0.5, linestyle='--', color='black', alpha=0.5)
ax.axvline(0.5, linestyle='--', color='black', alpha=0.5)
ax.set_xlabel('Log2 Fold Change')
ax.set_ylabel('-Log10(FDR)')
ax.set_title('CRISPR Screen Volcano Plot')
plt.tight_layout()
plt.savefig('volcano_plot.png', dpi=150)
Complete R Workflow
library(MAGeCKFlute)
library(ggplot2)
# Load MAGeCK results
gene_summary <- read.delim('negative_screen.gene_summary.txt')
sgrna_summary <- read.delim('negative_screen.sgrna_summary.txt')
# QC with MAGeCKFlute
FluteMLE(mle_output = 'mle_analysis.gene_summary.txt',
treatname = 'treatment',
proj = 'crispr_screen',
pathview.top = 10)
# Or for RRA results
FluteRRA(gene_summary = gene_summary,
sgrna_summary = sgrna_summary,
proj = 'rra_analysis')
# Custom rank plot
gene_summary$rank <- rank(gene_summary$`neg.score`)
gene_summary$is_hit <- gene_summary$`neg.fdr` < 0.05
ggplot(gene_summary, aes(x = rank, y = -log10(`neg.fdr` + 1e-10), color = is_hit)) +
geom_point(alpha = 0.5) +
geom_hline(yintercept = -log10(0.05), linetype = 'dashed') +
scale_color_manual(values = c('gray', 'red')) +
theme_bw() +
labs(title = 'Gene Rank Plot', x = 'Rank', y = '-Log10(FDR)')
ggsave('rank_plot.png', width = 10, height = 6)
BAGEL2 Alternative (Essential Genes)
# Calculate Bayes Factor for essentiality
BAGEL.py bf \
-i experiment.count.txt \
-o bagel_output \
-e CEGv2.txt \
-n NEGv1.txt \
-c Day0 \
-s Day14_Rep1,Day14_Rep2,Day14_Rep3
# Precision-recall analysis
BAGEL.py pr \
-i bagel_output.bf \
-o bagel_pr \
-e CEGv2.txt \
-n NEGv1.txt
QC Checkpoints
| Stage | Check | Action if Failed |
|---|---|---|
| Counting | >70% mapping rate | Check library/trimming |
| Zero guides | <20% | Check sequencing depth |
| Gini index | <0.4 | Check for amplification bias |
| Replicates | r > 0.8 | Check experimental consistency |
| Controls | Separate in PCA | Check screen worked |
Workflow Variants
Positive Selection Screen
# For enrichment screens (e.g., drug resistance)
mageck test \
-k counts.txt \
-t Resistant_Rep1,Resistant_Rep2 \
-c Sensitive \
-n positive_screen \
--gene-lfc-method alphamedian
CRISPRi/CRISPRa
# Same workflow, different interpretation
# CRISPRi: negative LFC = gene promotes phenotype
# CRISPRa: positive LFC = gene promotes phenotype
mageck test -k counts.txt -t Treated -c Control -n crispri_screen
Related Skills
- crispr-screens/screen-qc - Detailed QC metrics
- crispr-screens/mageck-analysis - MAGeCK parameters
- crispr-screens/hit-calling - Hit calling methods
- crispr-screens/crispresso-editing - Individual editing analysis
- crispr-screens/library-design - sgRNA selection and library design
- crispr-screens/batch-correction - Multi-batch normalization
- pathway-analysis/enrichment-analysis - Pathway enrichment of hits
GitHub Repository
Related Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
creating-opencode-plugins
MetaThis skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
