scvelo
정보
scvelo 스킬은 미접합/접합 mRNA 역학을 모델링하여 단일세포 RNA-seq 데이터에서 세포 상태 전이를 추론하는 RNA 속도 분석을 가능하게 합니다. 이 스킬은 세포 분화 경로와 운명 결정을 분석해야 할 때 사용하며, Scanpy와 같은 트래젝토리 추론 도구를 보완하여 트래젝토리 방향을 예측하고, 잠재 시간을 계산하며, 주도 유전자를 식별합니다.
빠른 설치
Claude Code
추천npx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/scveloClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
scVelo — RNA Velocity Analysis
Overview
scVelo is the leading Python package for RNA velocity analysis in single-cell RNA-seq data. It infers cell state transitions by modeling the kinetics of mRNA splicing — using the ratio of unspliced (pre-mRNA) to spliced (mature mRNA) abundances to determine whether a gene is being upregulated or downregulated in each cell. This allows reconstruction of developmental trajectories and identification of cell fate decisions without requiring time-course data.
Installation: pip install scvelo
Key resources:
- Documentation: https://scvelo.readthedocs.io/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen et al. (2020) Nature Biotechnology. PMID: 32747759
When to Use This Skill
Use scVelo when:
- Trajectory inference from snapshot data: Determine which direction cells are differentiating
- Cell fate prediction: Identify progenitor cells and their downstream fates
- Driver gene identification: Find genes whose dynamics best explain observed trajectories
- Developmental biology: Model hematopoiesis, neurogenesis, epithelial-to-mesenchymal transitions
- Latent time estimation: Order cells along a pseudotime derived from splicing dynamics
- Complement to Scanpy: Add directional information to UMAP embeddings
Prerequisites
scVelo requires count matrices for both unspliced and spliced RNA. These are generated by:
- STARsolo or kallisto|bustools with
lamannomode - velocyto CLI:
velocyto run10x/velocyto run - alevin-fry / simpleaf with spliced/unspliced output
Data is stored in an AnnData object with layers["spliced"] and layers["unspliced"].
Standard RNA Velocity Workflow
1. Setup and Data Loading
import scvelo as scv
import scanpy as sc
import numpy as np
import matplotlib.pyplot as plt
# Configure settings
scv.settings.verbosity = 3 # Show computation steps
scv.settings.presenter_view = True
scv.settings.set_figure_params('scvelo')
# Load data (AnnData with spliced/unspliced layers)
# Option A: Load from loom (velocyto output)
adata = scv.read("cellranger_output.loom", cache=True)
# Option B: Merge velocyto loom with Scanpy-processed AnnData
adata_processed = sc.read_h5ad("processed.h5ad") # Has UMAP, clusters
adata_velocity = scv.read("velocyto.loom")
adata = scv.utils.merge(adata_processed, adata_velocity)
# Verify layers
print(adata)
# obs × var: N × G
# layers: 'spliced', 'unspliced' (required)
# obsm['X_umap'] (required for visualization)
2. Preprocessing
# Filter and normalize (follows Scanpy conventions)
scv.pp.filter_and_normalize(
adata,
min_shared_counts=20, # Minimum counts in spliced+unspliced
n_top_genes=2000 # Top highly variable genes
)
# Compute first and second order moments (means and variances)
# knn_connectivities must be computed first
sc.pp.neighbors(adata, n_neighbors=30, n_pcs=30)
scv.pp.moments(
adata,
n_pcs=30,
n_neighbors=30
)
3. Velocity Estimation — Stochastic Model
The stochastic model is fast and suitable for exploratory analysis:
# Stochastic velocity (faster, less accurate)
scv.tl.velocity(adata, mode='stochastic')
scv.tl.velocity_graph(adata)
# Visualize
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
title="RNA Velocity (Stochastic)"
)
4. Velocity Estimation — Dynamical Model (Recommended)
The dynamical model fits the full splicing kinetics and is more accurate:
# Recover dynamics (computationally intensive; ~10-30 min for 10K cells)
scv.tl.recover_dynamics(adata, n_jobs=4)
# Compute velocity from dynamical model
scv.tl.velocity(adata, mode='dynamical')
scv.tl.velocity_graph(adata)
5. Latent Time
The dynamical model enables computation of a shared latent time (pseudotime):
# Compute latent time
scv.tl.latent_time(adata)
# Visualize latent time on UMAP
scv.pl.scatter(
adata,
color='latent_time',
color_map='gnuplot',
size=80,
title='Latent time'
)
# Identify top genes ordered by latent time
top_genes = adata.var['fit_likelihood'].sort_values(ascending=False).index[:300]
scv.pl.heatmap(
adata,
var_names=top_genes,
sortby='latent_time',
col_color='leiden',
n_convolve=100
)
6. Driver Gene Analysis
# Identify genes with highest velocity fit
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
df = scv.DataFrame(adata.uns['rank_velocity_genes']['names'])
print(df.head(10))
# Speed and coherence
scv.tl.velocity_confidence(adata)
scv.pl.scatter(
adata,
c=['velocity_length', 'velocity_confidence'],
cmap='coolwarm',
perc=[5, 95]
)
# Phase portraits for specific genes
scv.pl.velocity(adata, ['Cpe', 'Gnao1', 'Ins2'],
ncols=3, figsize=(16, 4))
7. Velocity Arrows and Pseudotime
# Arrow plot on UMAP
scv.pl.velocity_embedding(
adata,
arrow_length=3,
arrow_size=2,
color='leiden',
basis='umap'
)
# Stream plot (cleaner visualization)
scv.pl.velocity_embedding_stream(
adata,
basis='umap',
color='leiden',
smooth=0.8,
min_mass=4
)
# Velocity pseudotime (alternative to latent time)
scv.tl.velocity_pseudotime(adata)
scv.pl.scatter(adata, color='velocity_pseudotime', cmap='gnuplot')
8. PAGA Trajectory Graph
# PAGA graph with velocity-informed transitions
scv.tl.paga(adata, groups='leiden')
df = scv.get_df(adata, 'paga/transitions_confidence', precision=2).T
df.style.background_gradient(cmap='Blues').format('{:.2g}')
# Plot PAGA with velocity
scv.pl.paga(
adata,
basis='umap',
size=50,
alpha=0.1,
min_edge_width=2,
node_size_scale=1.5
)
Complete Workflow Script
import scvelo as scv
import scanpy as sc
def run_rna_velocity(adata, n_top_genes=2000, mode='dynamical', n_jobs=4):
"""
Complete RNA velocity workflow.
Args:
adata: AnnData with 'spliced' and 'unspliced' layers, UMAP in obsm
n_top_genes: Number of top HVGs for velocity
mode: 'stochastic' (fast) or 'dynamical' (accurate)
n_jobs: Parallel jobs for dynamical model
Returns:
Processed AnnData with velocity information
"""
scv.settings.verbosity = 2
# 1. Preprocessing
scv.pp.filter_and_normalize(adata, min_shared_counts=20, n_top_genes=n_top_genes)
if 'neighbors' not in adata.uns:
sc.pp.neighbors(adata, n_neighbors=30)
scv.pp.moments(adata, n_pcs=30, n_neighbors=30)
# 2. Velocity estimation
if mode == 'dynamical':
scv.tl.recover_dynamics(adata, n_jobs=n_jobs)
scv.tl.velocity(adata, mode=mode)
scv.tl.velocity_graph(adata)
# 3. Downstream analyses
if mode == 'dynamical':
scv.tl.latent_time(adata)
scv.tl.rank_velocity_genes(adata, groupby='leiden', min_corr=0.3)
scv.tl.velocity_confidence(adata)
scv.tl.velocity_pseudotime(adata)
return adata
Key Output Fields in AnnData
After running the workflow, the following fields are added:
| Location | Key | Description |
|---|---|---|
adata.layers | velocity | RNA velocity per gene per cell |
adata.layers | fit_t | Fitted latent time per gene per cell |
adata.obsm | velocity_umap | 2D velocity vectors on UMAP |
adata.obs | velocity_pseudotime | Pseudotime from velocity |
adata.obs | latent_time | Latent time from dynamical model |
adata.obs | velocity_length | Speed of each cell |
adata.obs | velocity_confidence | Confidence score per cell |
adata.var | fit_likelihood | Gene-level model fit quality |
adata.var | fit_alpha | Transcription rate |
adata.var | fit_beta | Splicing rate |
adata.var | fit_gamma | Degradation rate |
adata.uns | velocity_graph | Cell-cell transition probability matrix |
Velocity Models Comparison
| Model | Speed | Accuracy | When to Use |
|---|---|---|---|
stochastic | Fast | Moderate | Exploratory; large datasets |
deterministic | Medium | Moderate | Simple linear kinetics |
dynamical | Slow | High | Publication-quality; identifies driver genes |
Best Practices
- Start with stochastic mode for exploration; switch to dynamical for final analysis
- Need good coverage of unspliced reads: Short reads (< 100 bp) may miss intron coverage
- Minimum 2,000 cells: RNA velocity is noisy with fewer cells
- Velocity should be coherent: Arrows should follow known biology; randomness indicates issues
- k-NN bandwidth matters: Too few neighbors → noisy velocity; too many → oversmoothed
- Sanity check: Root cells (progenitors) should have high unspliced/spliced ratios for marker genes
- Dynamical model requires distinct kinetic states: Works best for clear differentiation processes
Troubleshooting
| Problem | Solution |
|---|---|
| Missing unspliced layer | Re-run velocyto or use STARsolo with --soloFeatures Gene Velocyto |
| Very few velocity genes | Lower min_shared_counts; check sequencing depth |
| Random-looking arrows | Try different n_neighbors or velocity model |
| Memory error with dynamical | Set n_jobs=1; reduce n_top_genes |
| Negative velocity everywhere | Check that spliced/unspliced layers are not swapped |
Additional Resources
- scVelo documentation: https://scvelo.readthedocs.io/
- Tutorial notebooks: https://scvelo.readthedocs.io/tutorials/
- GitHub: https://github.com/theislab/scvelo
- Paper: Bergen V et al. (2020) Nature Biotechnology. PMID: 32747759
- velocyto (preprocessing): http://velocyto.org/
- CellRank (fate prediction, extends scVelo): https://cellrank.readthedocs.io/
- dynamo (metabolic labeling alternative): https://dynamo-release.readthedocs.io/
GitHub 저장소
연관 스킬
llamaguard
기타LlamaGuard는 폭력 및 혐오 발언 등 6가지 안전 범주에서 LLM 입력과 출력을 조정하기 위한 Meta의 70-80억 파라미터 모델입니다. 94-95% 정확도를 제공하며 vLLM, Hugging Face 또는 Amazon SageMaker를 사용해 배포할 수 있습니다. 이 기술을 사용하여 AI 애플리케이션에 콘텐츠 필터링 및 안전 가드레일을 손쉽게 통합하세요.
cost-optimization
기타이 Claude Skill은 리소스 적정화, 태깅 전략, 지출 분석을 통해 개발자들이 클라우드 비용을 최적화할 수 있도록 지원합니다. AWS, Azure, GCP에서 클라우드 비용을 절감하고 비용 거버넌스를 구현하기 위한 프레임워크를 제공합니다. 인프라 비용을 분석하거나, 리소스를 적정화하거나, 예산 제약을 충족해야 할 때 사용하세요.
quantizing-models-bitsandbytes
기타이 스킬은 bitsandbytes를 사용하여 LLM을 8비트 또는 4비트 정밀도로 양자화하며, 최소한의 정확도 손실로 50-75%의 메모리 감소를 달성합니다. 제한된 GPU 메모리에서 더 큰 모델을 실행하거나 추론을 가속화하는 데 이상적이며, INT8, NF4, FP4와 같은 형식을 지원합니다. 이 스킬은 HuggingFace Transformers와 통합되어 QLoRA 학습 및 8비트 옵티마이저를 가능하게 합니다.
dispatching-parallel-agents
기타이 Claude Skill은 3개 이상의 독립적인 문제를 동시에 조사하고 해결하기 위해 다중 에이전트를 배치합니다. 공유 상태나 의존성 없이 해결 가능한 무관련 장애 시나리오에 맞게 설계되었습니다. 핵심 기능은 병렬 문제 해결로, 각 독립 문제 영역마다 하나의 에이전트를 할당하여 효율성을 극대화합니다.
