torchdrug
О программе
TorchDrug — это нативный для PyTorch инструментарий для создания пользовательских архитектур графовых нейронных сетей в области разработки лекарств, ориентированный на молекулы, белки и графы знаний. Он лучше всего подходит для разработки собственных моделей, решения таких задач, как предсказание свойств белков и планирование ретросинтеза. Для использования предобученных моделей или эталонных наборов данных следует интегрировать дополнительные библиотеки, такие как DeepChem или PyTDC.
Быстрая установка
Claude Code
Рекомендуетсяnpx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/torchdrugСкопируйте и вставьте эту команду в Claude Code для установки этого навыка
Документация
TorchDrug
Overview
TorchDrug is a comprehensive PyTorch-based machine learning toolbox for drug discovery and molecular science. Apply graph neural networks, pre-trained models, and task definitions to molecules, proteins, and biological knowledge graphs, including molecular property prediction, protein modeling, knowledge graph reasoning, molecular generation, retrosynthesis planning, with 40+ curated datasets and 20+ model architectures.
When to Use This Skill
This skill should be used when working with:
Data Types:
- SMILES strings or molecular structures
- Protein sequences or 3D structures (PDB files)
- Chemical reactions and retrosynthesis
- Biomedical knowledge graphs
- Drug discovery datasets
Tasks:
- Predicting molecular properties (solubility, toxicity, activity)
- Protein function or structure prediction
- Drug-target binding prediction
- Generating new molecular structures
- Planning chemical synthesis routes
- Link prediction in biomedical knowledge bases
- Training graph neural networks on scientific data
Libraries and Integration:
- TorchDrug is the primary library
- Often used with RDKit for cheminformatics
- Compatible with PyTorch and PyTorch Lightning
- Integrates with AlphaFold and ESM for proteins
Getting Started
Installation
TorchDrug 0.2.1 (latest on PyPI, July 2023) requires Python 3.7–3.10 and PyTorch 1.8–2.0. Install PyTorch and torch-scatter / torch-cluster first (wheel URL depends on your PyTorch and CUDA versions — see installation docs).
uv pip install torch
# Match torch/CUDA in the URL, e.g. torch-2.0.0+cu118 or cpu
uv pip install torch-scatter torch-cluster -f https://pytorch-geometric.com/whl/torch-2.0.0+cu118.html
uv pip install torchdrug==0.2.1
On Apple Silicon, compile scatter/cluster from source; TorchDrug runs on CPU only (no MPS). Conda: conda install torchdrug -c milagraph -c conda-forge -c pytorch -c pyg.
Quick Example
import torch
from torchdrug import datasets, models, tasks
from torch.utils.data import DataLoader
# Load molecular dataset
dataset = datasets.BBBP("~/molecule-datasets/")
train_set, valid_set, test_set = dataset.split()
# Define GNN model
model = models.GIN(
input_dim=dataset.node_feature_dim,
hidden_dims=[256, 256, 256],
edge_input_dim=dataset.edge_feature_dim,
batch_norm=True,
readout="mean"
)
# Create property prediction task
task = tasks.PropertyPrediction(
model,
task=dataset.tasks,
criterion="bce",
metric=["auroc", "auprc"]
)
# Train with PyTorch
optimizer = torch.optim.Adam(task.parameters(), lr=1e-3)
train_loader = DataLoader(train_set, batch_size=32, shuffle=True)
for epoch in range(100):
for batch in train_loader:
loss = task(batch)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Core Capabilities
1. Molecular Property Prediction
Predict chemical, physical, and biological properties of molecules from structure.
Use Cases:
- Drug-likeness and ADMET properties
- Toxicity screening
- Quantum chemistry properties
- Binding affinity prediction
Key Components:
- 20+ molecular datasets (BBBP, HIV, Tox21, QM9, etc.)
- GNN models (GIN, GAT, SchNet)
- PropertyPrediction and MultipleBinaryClassification tasks
Reference: See references/molecular_property_prediction.md for:
- Complete dataset catalog
- Model selection guide
- Training workflows and best practices
- Feature engineering details
2. Protein Modeling
Work with protein sequences, structures, and properties.
Use Cases:
- Enzyme function prediction
- Protein stability and solubility
- Subcellular localization
- Protein-protein interactions
- Structure prediction
Key Components:
- 15+ protein datasets (EnzymeCommission, GeneOntology, PDBBind, etc.)
- Sequence models (ESM, ProteinBERT, ProteinLSTM)
- Structure models (GearNet, SchNet)
- Multiple task types for different prediction levels
Reference: See references/protein_modeling.md for:
- Protein-specific datasets
- Sequence vs structure models
- Pre-training strategies
- Integration with AlphaFold and ESM
3. Knowledge Graph Reasoning
Predict missing links and relationships in biological knowledge graphs.
Use Cases:
- Drug repurposing
- Disease mechanism discovery
- Gene-disease associations
- Multi-hop biomedical reasoning
Key Components:
- General KGs (FB15k, WN18) and biomedical (Hetionet)
- Embedding models (TransE, RotatE, ComplEx)
- KnowledgeGraphCompletion task
Reference: See references/knowledge_graphs.md for:
- Knowledge graph datasets (including Hetionet with 45k biomedical entities)
- Embedding model comparison
- Evaluation metrics and protocols
- Biomedical applications
4. Molecular Generation
Generate novel molecular structures with desired properties.
Use Cases:
- De novo drug design
- Lead optimization
- Chemical space exploration
- Property-guided generation
Key Components:
- Autoregressive generation
- GCPN (policy-based generation)
- GraphAutoregressiveFlow
- Property optimization workflows
Reference: See references/molecular_generation.md for:
- Generation strategies (unconditional, conditional, scaffold-based)
- Multi-objective optimization
- Validation and filtering
- Integration with property prediction
5. Retrosynthesis
Predict synthetic routes from target molecules to starting materials.
Use Cases:
- Synthesis planning
- Route optimization
- Synthetic accessibility assessment
- Multi-step planning
Key Components:
- USPTO-50k reaction dataset
- CenterIdentification (reaction center prediction)
- SynthonCompletion (reactant prediction)
- End-to-end Retrosynthesis pipeline
Reference: See references/retrosynthesis.md for:
- Task decomposition (center ID → synthon completion)
- Multi-step synthesis planning
- Commercial availability checking
- Integration with other retrosynthesis tools
6. Graph Neural Network Models
Comprehensive catalog of GNN architectures for different data types and tasks.
Available Models:
- General GNNs: GCN, GAT, GIN, RGCN, MPNN
- 3D-aware: SchNet, GearNet
- Protein-specific: ESM, ProteinBERT, GearNet
- Knowledge graph: TransE, RotatE, ComplEx, SimplE
- Generative: GraphAutoregressiveFlow
Reference: See references/models_architectures.md for:
- Detailed model descriptions
- Model selection guide by task and dataset
- Architecture comparisons
- Implementation tips
7. Datasets
40+ curated datasets spanning chemistry, biology, and knowledge graphs.
Categories:
- Molecular properties (drug discovery, quantum chemistry)
- Protein properties (function, structure, interactions)
- Knowledge graphs (general and biomedical)
- Retrosynthesis reactions
Reference: See references/datasets.md for:
- Complete dataset catalog with sizes and tasks
- Dataset selection guide
- Loading and preprocessing
- Splitting strategies (random, scaffold)
Common Workflows
Workflow 1: Molecular Property Prediction
Scenario: Predict blood-brain barrier penetration for drug candidates.
Steps:
- Load dataset:
datasets.BBBP() - Choose model: GIN for molecular graphs
- Define task:
PropertyPredictionwith binary classification - Train with scaffold split for realistic evaluation
- Evaluate using AUROC and AUPRC
Navigation: references/molecular_property_prediction.md → Dataset selection → Model selection → Training
Workflow 2: Protein Function Prediction
Scenario: Predict enzyme function from sequence.
Steps:
- Load dataset:
datasets.EnzymeCommission() - Choose model: ESM (pre-trained) or GearNet (with structure)
- Define task:
PropertyPredictionwith multi-class classification - Fine-tune pre-trained model or train from scratch
- Evaluate using accuracy and per-class metrics
Navigation: references/protein_modeling.md → Model selection (sequence vs structure) → Pre-training strategies
Workflow 3: Drug Repurposing via Knowledge Graphs
Scenario: Find new disease treatments in Hetionet.
Steps:
- Load dataset:
datasets.Hetionet() - Choose model: RotatE or ComplEx
- Define task:
KnowledgeGraphCompletion - Train with negative sampling
- Query for "Compound-treats-Disease" predictions
- Filter by plausibility and mechanism
Navigation: references/knowledge_graphs.md → Hetionet dataset → Model selection → Biomedical applications
Workflow 4: De Novo Molecule Generation
Scenario: Generate drug-like molecules optimized for target binding.
Steps:
- Train property predictor on activity data
- Choose generation approach: GCPN for RL-based optimization
- Define reward function combining affinity, drug-likeness, synthesizability
- Generate candidates with property constraints
- Validate chemistry and filter by drug-likeness
- Rank by multi-objective scoring
Navigation: references/molecular_generation.md → Conditional generation → Multi-objective optimization
Workflow 5: Retrosynthesis Planning
Scenario: Plan synthesis route for target molecule.
Steps:
- Load dataset:
datasets.USPTO50k() - Train center identification model (RGCN)
- Train synthon completion model (GIN)
- Combine into end-to-end retrosynthesis pipeline
- Apply recursively for multi-step planning
- Check commercial availability of building blocks
Navigation: references/retrosynthesis.md → Task types → Multi-step planning
Integration Patterns
With RDKit
Convert between TorchDrug molecules and RDKit:
from torchdrug import data
from rdkit import Chem
# SMILES → TorchDrug molecule
smiles = "CCO"
mol = data.Molecule.from_smiles(smiles)
# TorchDrug → RDKit
rdkit_mol = mol.to_molecule()
# RDKit → TorchDrug
rdkit_mol = Chem.MolFromSmiles(smiles)
mol = data.Molecule.from_molecule(rdkit_mol)
With AlphaFold/ESM
Use predicted structures:
from torchdrug import data
# Load AlphaFold predicted structure
protein = data.Protein.from_pdb("AF-P12345-F1-model_v4.pdb")
# Build graph with spatial edges
graph = protein.residue_graph(
node_position="ca",
edge_types=["sequential", "radius"],
radius_cutoff=10.0
)
With PyTorch Lightning
Wrap tasks for Lightning training:
import pytorch_lightning as pl
class LightningTask(pl.LightningModule):
def __init__(self, torchdrug_task):
super().__init__()
self.task = torchdrug_task
def training_step(self, batch, batch_idx):
return self.task(batch)
def validation_step(self, batch, batch_idx):
pred = self.task.predict(batch)
target = self.task.target(batch)
return {"pred": pred, "target": target}
def configure_optimizers(self):
return torch.optim.Adam(self.parameters(), lr=1e-3)
Technical Details
For deep dives into TorchDrug's architecture:
Core Concepts: See references/core_concepts.md for:
- Architecture philosophy (modular, configurable)
- Data structures (Graph, Molecule, Protein, PackedGraph)
- Model interface and forward function signature
- Task interface (predict, target, forward, evaluate)
- Training workflows and best practices
- Loss functions and metrics
- Common pitfalls and debugging
Quick Reference Cheat Sheet
Choose Dataset:
- Molecular property →
references/datasets.md→ Molecular section - Protein task →
references/datasets.md→ Protein section - Knowledge graph →
references/datasets.md→ Knowledge graph section
Choose Model:
- Molecules →
references/models_architectures.md→ GNN section → GIN/GAT/SchNet - Proteins (sequence) →
references/models_architectures.md→ Protein section → ESM - Proteins (structure) →
references/models_architectures.md→ Protein section → GearNet - Knowledge graph →
references/models_architectures.md→ KG section → RotatE/ComplEx
Common Tasks:
- Property prediction →
references/molecular_property_prediction.mdorreferences/protein_modeling.md - Generation →
references/molecular_generation.md - Retrosynthesis →
references/retrosynthesis.md - KG reasoning →
references/knowledge_graphs.md
Understand Architecture:
- Data structures →
references/core_concepts.md→ Data Structures - Model design →
references/core_concepts.md→ Model Interface - Task design →
references/core_concepts.md→ Task Interface
Troubleshooting Common Issues
Issue: Dimension mismatch errors
→ Check model.input_dim matches dataset.node_feature_dim
→ See references/core_concepts.md → Essential Attributes
Issue: Poor performance on molecular tasks
→ Use scaffold splitting, not random
→ Try GIN instead of GCN
→ See references/molecular_property_prediction.md → Best Practices
Issue: Protein model not learning
→ Use pre-trained ESM for sequence tasks
→ Check edge construction for structure models
→ See references/protein_modeling.md → Training Workflows
Issue: Memory errors with large graphs
→ Reduce batch size
→ Use gradient accumulation
→ See references/core_concepts.md → Memory Efficiency
Issue: Generated molecules are invalid
→ Add validity constraints
→ Post-process with RDKit validation
→ See references/molecular_generation.md → Validation and Filtering
Version Notes (0.2.1)
PropertyPrediction.predict()returns original-scale values (not standardized); code written for older TorchDrug may need metric/threshold updates (release notes).- Dataset constructors prefer
atom_feature/bond_feature/mol_feature;node_feature/edge_feature/graph_featureare deprecated aliases. EvolutionaryScaleModelingsupports ESM-2 checkpoints in addition to ESM-1b.
Resources
Official Documentation: https://torchdrug.ai/docs/ (0.2.1) GitHub: https://github.com/DeepGraphLearning/torchdrug Paper: TorchDrug: A Powerful and Flexible Machine Learning Platform for Drug Discovery
Summary
Navigate to the appropriate reference file based on your task:
- Molecular property prediction →
molecular_property_prediction.md - Protein modeling →
protein_modeling.md - Knowledge graphs →
knowledge_graphs.md - Molecular generation →
molecular_generation.md - Retrosynthesis →
retrosynthesis.md - Model selection →
models_architectures.md - Dataset selection →
datasets.md - Technical details →
core_concepts.md
Each reference provides comprehensive coverage of its domain with examples, best practices, and common use cases.
GitHub репозиторий
Похожие навыки
content-collections
МетаЭтот навык предоставляет проверенную в продакшене настройку для Content Collections — TypeScript-ориентированного инструмента, который преобразует файлы Markdown/MDX в типобезопасные коллекции данных с валидацией Zod. Используйте его при создании блогов, сайтов документации или контентных приложений на Vite + React для обеспечения типобезопасности и автоматической проверки содержимого. Он охватывает всё: от настройки плагина Vite и компиляции MDX до оптимизации развертывания и валидации схем.
polymarket
МетаЭтот навык позволяет разработчикам создавать приложения на платформе прогнозных рынков Polymarket, включая интеграцию с API для торговли и получения рыночных данных. Он также обеспечивает потоковую передачу данных в реальном времени через WebSocket для отслеживания текущих сделок и рыночной активности. Используйте его для реализации торговых стратегий или создания инструментов, обрабатывающих обновления рынка в реальном времени.
creating-opencode-plugins
МетаЭтот навык помогает разработчикам создавать плагины OpenCode, которые подключаются к более чем 25 типам событий, таким как команды, файлы и операции LSP. Он предоставляет структуру плагина, спецификации API событий и шаблоны реализации для модулей на JavaScript/TypeScript. Используйте его, когда вам нужно перехватывать, отслеживать или расширять жизненный цикл ассистента OpenCode AI с помощью пользовательской событийно-ориентированной логики.
sglang
МетаSGLang — это высокопроизводительный фреймворк для обслуживания больших языковых моделей (LLM), специализирующийся на быстрой структурированной генерации JSON, regex и рабочих процессов агентов с использованием кэширования префиксов RadixAttention. Он обеспечивает значительно более высокую скорость вывода, особенно для задач с повторяющимися префиксами, что делает его идеальным для сложных структурированных результатов и многократных диалогов. Выбирайте SGLang вместо альтернатив, таких как vLLM, когда вам требуется ограниченное декодирование или вы создаете приложения с интенсивным совместным использованием префиксов.
