scvi-tools
About
scvi-tools is a PyTorch-based Python framework for advanced probabilistic modeling of single-cell omics data. Use it for tasks requiring deep generative models, such as probabilistic batch correction (scVI), differential expression with uncertainty, or multi-modal data integration (TOTALVI, MultiVI). It's designed for complex analyses involving batch effects or multimodal datasets, while standard pipelines are better served by tools like scanpy.
Quick Install
Claude Code
Recommendednpx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/scvi-toolsCopy and paste this command in Claude Code to install this skill
Documentation
scvi-tools
Overview
scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.
When to Use This Skill
Use this skill when:
- Analyzing single-cell RNA-seq data (dimensionality reduction, batch correction, integration)
- Working with single-cell ATAC-seq or chromatin accessibility data
- Integrating multimodal data (CITE-seq, multiome, paired/unpaired datasets)
- Analyzing spatial transcriptomics data (deconvolution, spatial mapping)
- Performing differential expression analysis on single-cell data
- Conducting cell type annotation or transfer learning tasks
- Working with specialized single-cell modalities (methylation, cytometry, RNA velocity)
- Building custom probabilistic models for single-cell analysis
Core Capabilities
scvi-tools provides models organized by data modality:
1. Single-Cell RNA-seq Analysis
Core models for expression analysis, batch correction, and integration. See references/models-scrna-seq.md for:
- scVI: Unsupervised dimensionality reduction and batch correction
- scANVI: Semi-supervised cell type annotation and integration
- AUTOZI: Zero-inflation detection and modeling
- VeloVI: RNA velocity analysis
- contrastiveVI: Perturbation effect isolation
2. Chromatin Accessibility (ATAC-seq)
Models for analyzing single-cell chromatin data. See references/models-atac-seq.md for:
- PeakVI: Peak-based ATAC-seq analysis and integration
- PoissonVI: Quantitative fragment count modeling
- scBasset: Deep learning approach with motif analysis
3. Multimodal & Multi-omics Integration
Joint analysis of multiple data types. See references/models-multimodal.md for:
- totalVI: CITE-seq protein and RNA joint modeling
- MultiVI: Paired and unpaired multi-omic integration
- MrVI: Multi-resolution cross-sample analysis
4. Spatial Transcriptomics
Spatially-resolved transcriptomics analysis. See references/models-spatial.md for:
- DestVI: Multi-resolution spatial deconvolution
- Stereoscope: Cell type deconvolution
- Tangram: Spatial mapping and integration
- scVIVA: Cell-environment relationship analysis
5. Specialized Modalities
Additional specialized analysis tools. See references/models-specialized.md for:
- MethylVI/MethylANVI: Single-cell methylation analysis
- CytoVI: Flow/mass cytometry batch correction
- Solo: Doublet detection
- CellAssign: Marker-based cell type annotation
Typical Workflow
All scvi-tools models follow a consistent API pattern:
# 1. Load and preprocess data (AnnData format)
import scvi
import scanpy as sc
adata = scvi.data.heart_cell_atlas_subsampled()
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.highly_variable_genes(adata, n_top_genes=1200)
# 2. Register data with model (specify layers, covariates)
scvi.model.SCVI.setup_anndata(
adata,
layer="counts", # Use raw counts, not log-normalized
batch_key="batch",
categorical_covariate_keys=["donor"],
continuous_covariate_keys=["percent_mito"]
)
# 3. Create and train model
model = scvi.model.SCVI(adata)
model.train()
# 4. Extract latent representations and normalized values
latent = model.get_latent_representation()
normalized = model.get_normalized_expression(library_size=1e4)
# 5. Store in AnnData for downstream analysis
adata.obsm["X_scVI"] = latent
adata.layers["scvi_normalized"] = normalized
# 6. Downstream analysis with scanpy
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)
Key Design Principles:
- Raw counts required: Models expect unnormalized count data for optimal performance
- Unified API: Consistent interface across all models (setup → train → extract)
- AnnData-centric: Seamless integration with the scanpy ecosystem
- GPU acceleration: Automatic utilization of available GPUs
- Batch correction: Handle technical variation through covariate registration
Common Analysis Tasks
Differential Expression
Probabilistic DE analysis using the learned generative models:
de_results = model.differential_expression(
groupby="cell_type",
group1="TypeA",
group2="TypeB",
mode="change", # Use composite hypothesis testing
delta=0.25 # Minimum effect size threshold
)
See references/differential-expression.md for detailed methodology and interpretation.
Model Persistence
Save and load trained models:
# Save model
model.save("./model_directory", overwrite=True)
# Load model
model = scvi.model.SCVI.load("./model_directory", adata=adata)
Batch Correction and Integration
Integrate datasets across batches or studies:
# Register batch information
scvi.model.SCVI.setup_anndata(adata, batch_key="study")
# Model automatically learns batch-corrected representations
model = scvi.model.SCVI(adata)
model.train()
latent = model.get_latent_representation() # Batch-corrected
Theoretical Foundations
scvi-tools is built on:
- Variational inference: Approximate posterior distributions for scalable Bayesian inference
- Deep generative models: VAE architectures that learn complex data distributions
- Amortized inference: Shared neural networks for efficient learning across cells
- Probabilistic modeling: Principled uncertainty quantification and statistical testing
See references/theoretical-foundations.md for detailed background on the mathematical framework.
Additional Resources
- Workflows:
references/workflows.mdcontains common workflows, best practices, hyperparameter tuning, and GPU optimization - Model References: Detailed documentation for each model category in the
references/directory - Official Documentation: https://docs.scvi-tools.org/en/stable/
- Tutorials: https://docs.scvi-tools.org/en/stable/tutorials/index.html
- API Reference: https://docs.scvi-tools.org/en/stable/api/index.html
Installation
uv pip install scvi-tools
# For GPU support
uv pip install scvi-tools[cuda]
Best Practices
- Use raw counts: Always provide unnormalized count data to models
- Filter genes: Remove low-count genes before analysis (e.g.,
min_counts=3) - Register covariates: Include known technical factors (batch, donor, etc.) in
setup_anndata - Feature selection: Use highly variable genes for improved performance
- Model saving: Always save trained models to avoid retraining
- GPU usage: Enable GPU acceleration for large datasets (
accelerator="gpu") - Scanpy integration: Store outputs in AnnData objects for downstream analysis
GitHub Repository
Related Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
polymarket
MetaThis skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.
creating-opencode-plugins
MetaThis skill helps developers create OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It provides the plugin structure, event API specifications, and implementation patterns for JavaScript/TypeScript modules. Use it when you need to intercept, monitor, or extend the OpenCode AI assistant's lifecycle with custom event-driven logic.
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
