返回技能列表

scvi-tools

K-Dense-AI
更新于 Today
26,534
2,743
26,534
在 GitHub 上查看
aidata

关于

scvi-tools is a PyTorch-based Python framework for advanced probabilistic modeling of single-cell omics data. Use it for tasks requiring deep generative models, such as probabilistic batch correction (scVI), differential expression with uncertainty, or multi-modal data integration (TOTALVI, MultiVI). It's designed for complex analyses involving batch effects or multimodal datasets, while standard pipelines are better served by tools like scanpy.

快速安装

Claude Code

推荐
主要方式
npx skills add K-Dense-AI/claude-scientific-skills -a claude-code
插件命令备选方式
/plugin add https://github.com/K-Dense-AI/claude-scientific-skills
Git 克隆备选方式
git clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/scvi-tools

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

scvi-tools

Overview

scvi-tools is a comprehensive Python framework for probabilistic models in single-cell genomics. Built on PyTorch and PyTorch Lightning, it provides deep generative models using variational inference for analyzing diverse single-cell data modalities.

When to Use This Skill

Use this skill when:

  • Analyzing single-cell RNA-seq data (dimensionality reduction, batch correction, integration)
  • Working with single-cell ATAC-seq or chromatin accessibility data
  • Integrating multimodal data (CITE-seq, multiome, paired/unpaired datasets)
  • Analyzing spatial transcriptomics data (deconvolution, spatial mapping)
  • Performing differential expression analysis on single-cell data
  • Conducting cell type annotation or transfer learning tasks
  • Working with specialized single-cell modalities (methylation, cytometry, RNA velocity)
  • Building custom probabilistic models for single-cell analysis

Core Capabilities

scvi-tools provides models organized by data modality:

1. Single-Cell RNA-seq Analysis

Core models for expression analysis, batch correction, and integration. See references/models-scrna-seq.md for:

  • scVI: Unsupervised dimensionality reduction and batch correction
  • scANVI: Semi-supervised cell type annotation and integration
  • AUTOZI: Zero-inflation detection and modeling
  • VeloVI: RNA velocity analysis
  • contrastiveVI: Perturbation effect isolation

2. Chromatin Accessibility (ATAC-seq)

Models for analyzing single-cell chromatin data. See references/models-atac-seq.md for:

  • PeakVI: Peak-based ATAC-seq analysis and integration
  • PoissonVI: Quantitative fragment count modeling
  • scBasset: Deep learning approach with motif analysis

3. Multimodal & Multi-omics Integration

Joint analysis of multiple data types. See references/models-multimodal.md for:

  • totalVI: CITE-seq protein and RNA joint modeling
  • MultiVI: Paired and unpaired multi-omic integration
  • MrVI: Multi-resolution cross-sample analysis

4. Spatial Transcriptomics

Spatially-resolved transcriptomics analysis. See references/models-spatial.md for:

  • DestVI: Multi-resolution spatial deconvolution
  • Stereoscope: Cell type deconvolution
  • Tangram: Spatial mapping and integration
  • scVIVA: Cell-environment relationship analysis

5. Specialized Modalities

Additional specialized analysis tools. See references/models-specialized.md for:

  • MethylVI/MethylANVI: Single-cell methylation analysis
  • CytoVI: Flow/mass cytometry batch correction
  • Solo: Doublet detection
  • CellAssign: Marker-based cell type annotation

Typical Workflow

All scvi-tools models follow a consistent API pattern:

# 1. Load and preprocess data (AnnData format)
import scvi
import scanpy as sc

adata = scvi.data.heart_cell_atlas_subsampled()
sc.pp.filter_genes(adata, min_counts=3)
sc.pp.highly_variable_genes(adata, n_top_genes=1200)

# 2. Register data with model (specify layers, covariates)
scvi.model.SCVI.setup_anndata(
    adata,
    layer="counts",  # Use raw counts, not log-normalized
    batch_key="batch",
    categorical_covariate_keys=["donor"],
    continuous_covariate_keys=["percent_mito"]
)

# 3. Create and train model
model = scvi.model.SCVI(adata)
model.train()

# 4. Extract latent representations and normalized values
latent = model.get_latent_representation()
normalized = model.get_normalized_expression(library_size=1e4)

# 5. Store in AnnData for downstream analysis
adata.obsm["X_scVI"] = latent
adata.layers["scvi_normalized"] = normalized

# 6. Downstream analysis with scanpy
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)
sc.tl.leiden(adata)

Key Design Principles:

  • Raw counts required: Models expect unnormalized count data for optimal performance
  • Unified API: Consistent interface across all models (setup → train → extract)
  • AnnData-centric: Seamless integration with the scanpy ecosystem
  • GPU acceleration: Automatic utilization of available GPUs
  • Batch correction: Handle technical variation through covariate registration

Common Analysis Tasks

Differential Expression

Probabilistic DE analysis using the learned generative models:

de_results = model.differential_expression(
    groupby="cell_type",
    group1="TypeA",
    group2="TypeB",
    mode="change",  # Use composite hypothesis testing
    delta=0.25      # Minimum effect size threshold
)

See references/differential-expression.md for detailed methodology and interpretation.

Model Persistence

Save and load trained models:

# Save model
model.save("./model_directory", overwrite=True)

# Load model
model = scvi.model.SCVI.load("./model_directory", adata=adata)

Batch Correction and Integration

Integrate datasets across batches or studies:

# Register batch information
scvi.model.SCVI.setup_anndata(adata, batch_key="study")

# Model automatically learns batch-corrected representations
model = scvi.model.SCVI(adata)
model.train()
latent = model.get_latent_representation()  # Batch-corrected

Theoretical Foundations

scvi-tools is built on:

  • Variational inference: Approximate posterior distributions for scalable Bayesian inference
  • Deep generative models: VAE architectures that learn complex data distributions
  • Amortized inference: Shared neural networks for efficient learning across cells
  • Probabilistic modeling: Principled uncertainty quantification and statistical testing

See references/theoretical-foundations.md for detailed background on the mathematical framework.

Additional Resources

Installation

uv pip install scvi-tools
# For GPU support
uv pip install scvi-tools[cuda]

Best Practices

  1. Use raw counts: Always provide unnormalized count data to models
  2. Filter genes: Remove low-count genes before analysis (e.g., min_counts=3)
  3. Register covariates: Include known technical factors (batch, donor, etc.) in setup_anndata
  4. Feature selection: Use highly variable genes for improved performance
  5. Model saving: Always save trained models to avoid retraining
  6. GPU usage: Enable GPU acceleration for large datasets (accelerator="gpu")
  7. Scanpy integration: Store outputs in AnnData objects for downstream analysis

GitHub 仓库

K-Dense-AI/claude-scientific-skills
路径: skills/scvi-tools
0
agent-skillsai-scientistbioinformaticschemoinformaticsclaudeclaude-skills

相关推荐技能

content-collections

Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。

查看技能

polymarket

这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。

查看技能

creating-opencode-plugins

该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。

查看技能

sglang

SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。

查看技能