scvelo
Über
Die scvelo-Skill ermöglicht RNA-Velocity-Analysen, um Zellzustandsübergänge aus Einzelzell-RNA-seq-Daten abzuleiten, indem sie die Dynamik von ungespleißter/gespleißter mRNA modelliert. Sie sagt Trajektorierichtungen voraus, berechnet latente Zeit und identifiziert Treibergene, wodurch sie Werkzeuge wie Scanpy für Trajektorieninferenz ergänzt. Nutzen Sie diese Skill, wenn Sie zelluläre Differenzierungspfade und Schicksalsentscheidungen in Ihren Einzelzell-Datensätzen analysieren müssen.
Schnellinstallation
Claude Code
Empfohlennpx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/scveloKopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren
Dokumentation
scVelo — RNA Velocity Analysis
Overview
scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
Installation: pip install scvelo
Key resources:
- Documentation: https://scvelo.readthedocs.io/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759
When to Use This Skill
Use scVelo when:
- Trajectory inference from snapshot data: Determine which direction cells are differentiating
- Cell fate prediction: Identify progenitor cells and their downstream fates
- Driver gene identification: Find genes whose dynamics best explain observed trajectories
- Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
- Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
- Complement to Scanpy: Add directional information to UMAP embeddings
Prerequisites
scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:
- STARsolo or kallisto|bustools with
lamannomode - velocyto CLI:
velocyto run10x/velocyto run - alevin-fry / simpleaf with spliced/unspliced output
Data is stored in an AnnData object with layers["spliced"] and layers["unspliced"].
Standard RNA Velocity Workflow
1. Setup and Data Loading
import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
# Configure settings
scv.settings.verbosity = 3 # Show computation steps
scv.settings.presenter_view = True
scv.settings.set_figure_params('scvelo')
# Load data (AnnData with spliced/unspliced layers)
# Option A: Load from loom (velocyto output)
adata = scv.read("cellranger_output.loom", cache=True)
# Option B: Merge velocyto loom with Scanpy-processed AnnData
adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters
adata_velocity = scv.read("velocyto.loom")
adata = scv.utils.merge(adata_processed, adata_velocity)
# Verify layers
print(adata)
# obs × var: N × G
# layers: 'spliced', 'unspliced' (required)
# obsm['X_umap'] (required for visualization)
2. Preprocessing
# Filter and normalize (follows Scanpy conventions)
scv.pp.filter_and_normalize(
adata,
min_shared_counts=20, # Minimum counts in spliced+unspliced
n_top_genes=2000 # Top highly variable genes
)
# Compute first and second order moments (means and variances)
# knn_connectivities must be computed first
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30)
scv.pp.moments(
adata,
n_pcs=30,
n_neighbors=30
)
3. Velocity Estimation — Stochastic Model
The stochastic model is fast and suitable for exploratory analysis:
# Stochastic velocity (faster, less accurate)
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)
# Visualize
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
title="RNA Velocity (Stochastic)"
)
4. Velocity Estimation — Dynamical Model (Recommended)
The dynamical model fits the full splicing kinetics and is more accurate:
# Recover dynamics (computationally intensive; ~10-30 min for 10K cells)
scv.tl.recover_dynamics(adata, n_jobs=4)
# Compute velocity from dynamical model
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
5. Latent Time
The dynamical model enables computation of a shared latent time (pseudotime):
# Compute latent time
scv.tl.latent_time(adata)
# Visualize latent time on UMAP
scv.pl.scatter(
adata,
color='latent_time',
color_map='gnuplot',
size=80,
title='Latent time'
)
# Identify top genes ordered by latent time
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
scv.pl.heatmap(
adata,
var_names=top_genes,
sortby='latent_time',
col_color='leiden',
n_convolve=100
)
6. Driver Gene Analysis
# Identify genes with highest velocity fit
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
print(df.head(10))
# Speed and coherence
scv.tl.velocity_confidence(adata)
scv.pl.scatter(
adata,
c=['velocity_length', 'velocity_confidence'],
cmap='coolwarm',
perc=[5, 95]
)
# Phase portraits for specific genes
scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'],
ncols=3, figsize=(16, 4))
7. Velocity Arrows and Pseudotime
# Arrow plot on UMAP
scv.pl.velocity_embedding(
adata,
arrow_length=3,
arrow_size=2,
color='leiden',
basis='umap'
)
# Stream plot (cleaner visualization)
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
smooth=0.8,
min_mass=4
)
# Velocity pseudotime (alternative to latent time)
scv.tl.velocity_pseudotime(adata)
scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
8. PAGA Trajectory Graph
# PAGA graph with velocity-informed transitions
scv.tl.paga(adata, groups='leiden')
df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
df.style.background_gradient(cmap='Blues').format('{:.2g}')
# Plot PAGA with velocity
scv.pl.paga(
adata,
basis='umap',
size=50,
alpha=0.1,
min_edge_width=2,
node_size_scale=1.5
)
Complete Workflow Script
import scvelo as scv
import scanpy as sc
def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
"""
Complete RNA velocity workflow.
Args:
adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
n_top_genes: Number of top HVGs for velocity
mode: 'stochastic' (fast) or 'dynamical' (accurate)
n_jobs: Parallel jobs for dynamical model
Returns:
Processed AnnData with velocity information
"""
scv.settings.verbosity = 2
# 1. Preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
# 2. Velocity estimation
if mode == 'dynamical':
scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
scv.tl.velocity(adata, mode=mode)
scv.tl.velocity_graph(adata)
# 3. Downstream analyses
if mode == 'dynamical':
scv.tl.latent_time(adata)
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
scv.tl.velocity_confidence(adata)
scv.tl.velocity_pseudotime(adata)
return adata
Key Output Fields in AnnData
After running the workflow, the following fields are added:
| Location | Key | Description |
|---|---|---|
adata.layers | velocity | RNA velocity per gene per cell |
adata.layers | fit_t | Fitted latent time per gene per cell |
adata.obsm | velocity_umap | 2D velocity vectors on UMAP |
adata.obs | velocity_pseudotime | Pseudotime from velocity |
adata.obs | latent_time | Latent time from dynamical model |
adata.obs | velocity_length | Speed of each cell |
adata.obs | velocity_confidence | Confidence score per cell |
adata.var | fit_likelihood | Gene-level model fit quality |
adata.var | fit_alpha | Transcription rate |
adata.var | fit_beta | Splicing rate |
adata.var | fit_gamma | Degradation rate |
adata.uns | velocity_graph | Cell-cell transition probability matrix |
Velocity Models Comparison
| Model | Speed | Accuracy | When to Use |
|---|---|---|---|
stochastic | Fast | Moderate | Exploratory; large datasets |
deterministic | Medium | Moderate | Simple linear kinetics |
dynamical | Slow | High | Publication-quality; identifies driver genes |
Best Practices
- Start with stochastic mode for exploration; switch to dynamical for final analysis
- Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
- Minimum 2,000 cells: RNA velocity is noisy with fewer cells
- Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
- k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
- Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
- Dynamical model requires distinct kinetic states: Works best for clear differentiation processes
Troubleshooting
| Problem | Solution |
|---|---|
| Missing unspliced layer | Re-run velocyto or use STARsolo with --soloFeatures Gene Velocyto |
| Very few velocity genes | Lower min_shared_counts; check sequencing depth |
| Random-looking arrows | Try different n_neighbors or velocity model |
| Memory error with dynamical | Set n_jobs=1; reduce n_top_genes |
| Negative velocity everywhere | Check that spliced/unspliced layers are not swapped |
Additional Resources
- scVelo documentation: https://scvelo.readthedocs.io/
- Tutorial notebooks: https://scvelo.readthedocs.io/tutorials/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
- velocyto (preprocessing): http://velocyto.org/
- CellRank (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
- dynamo (metabolic labeling alternative): https://dynamo-release.readthedocs.io/
GitHub Repository
Verwandte Skills
llamaguard
AndereLlamaGuard ist Metas 7-8B-Parameter-Modell zur Moderation von LLM-Eingaben und -Ausgaben in sechs Sicherheitskategorien wie Gewalt und Hassrede. Es bietet eine Genauigkeit von 94-95 % und kann mit vLLM, Hugging Face oder Amazon SageMaker eingesetzt werden. Nutzen Sie diese Skill, um Inhaltsfilterung und Sicherheitsguardrails einfach in Ihre KI-Anwendungen zu integrieren.
cost-optimization
AndereDiese Claude Skill unterstützt Entwickler bei der Optimierung von Cloud-Kosten durch Ressourcen-Dimensionierung, Tagging-Strategien und Ausgabenanalysen. Sie bietet einen Rahmen zur Senkung von Cloud-Ausgaben und zur Implementierung von Kosten-Governance für AWS, Azure und GCP. Nutzen Sie sie, wenn Sie Infrastrukturkosten analysieren, Ressourcen richtig dimensionieren oder Budgetvorgaben einhalten müssen.
quantizing-models-bitsandbytes
AndereDiese Fähigkeit quantisiert LLMs auf 8-Bit- oder 4-Bit-Präzision mittels bitsandbytes und erreicht dabei eine Speicherreduzierung von 50–75 % bei minimalem Genauigkeitsverlust. Sie ist ideal für den Betrieb größerer Modelle mit begrenztem GPU-Speicher oder zur Beschleunigung von Inferenzvorgängen und unterstützt Formate wie INT8, NF4 und FP4. Die Fähigkeit integriert sich in HuggingFace Transformers und ermöglicht QLoRA-Training sowie 8-Bit-Optimierer.
dispatching-parallel-agents
AndereDiese Claude-Fähigkeit verteilt mehrere Agenten, um drei oder mehr unabhängige Probleme gleichzeitig zu untersuchen und zu beheben. Sie ist für Szenarien konzipiert, die unabhängige Fehler umfassen, die ohne gemeinsamen Zustand oder Abhängigkeiten gelöst werden können. Die Kernfähigkeit ist die parallele Problemlösung, bei der pro unabhängigem Problembereich ein Agent zugewiesen wird, um die Effizienz zu maximieren.
