rdkit
정보
rdkit 스킬은 SMILES/SDF 파싱, 디스크립터 계산, 핑거프린팅, 서브구조 검색을 포함한 분자 분석을 위한 세분화된 화학정보학 제어 기능을 제공합니다. 표준 작업에는 datamol의 간편한 인터페이스를 사용하고, 사용자 정의 세니타이제이션이나 특수 알고리즘이 필요한 고급 워크플로우에 이 스킬을 활용하세요. 이 스킬은 약물 발견 및 계산화학 분야에서 2D/3D 생성, 유사도 계산, 화학 반응 처리를 가능하게 합니다.
빠른 설치
Claude Code
추천npx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/rdkitClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
RDKit Cheminformatics Toolkit
Overview
RDKit is a comprehensive cheminformatics library providing Python APIs for molecular analysis and manipulation. This skill provides guidance for reading/writing molecular structures, calculating descriptors, fingerprinting, substructure searching, chemical reactions, 2D/3D coordinate generation, and molecular visualization. Use this skill for drug discovery, computational chemistry, and cheminformatics research tasks.
Core Capabilities
1. Molecular I/O and Creation
Reading Molecules:
Read molecular structures from various formats:
from rdkit import Chem
# From SMILES strings
mol = Chem.MolFromSmiles('Cc1ccccc1') # Returns Mol object or None
# From MOL files
mol = Chem.MolFromMolFile('path/to/file.mol')
# From MOL blocks (string data)
mol = Chem.MolFromMolBlock(mol_block_string)
# From InChI
mol = Chem.MolFromInchi('InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H')
Writing Molecules:
Convert molecules to text representations:
# To canonical SMILES
smiles = Chem.MolToSmiles(mol)
# To MOL block
mol_block = Chem.MolToMolBlock(mol)
# To InChI
inchi = Chem.MolToInchi(mol)
Batch Processing:
For processing multiple molecules, use Supplier/Writer objects:
# Read SDF files
suppl = Chem.SDMolSupplier('molecules.sdf')
for mol in suppl:
if mol is not None: # Check for parsing errors
# Process molecule
pass
# Read SMILES files
suppl = Chem.SmilesMolSupplier('molecules.smi', titleLine=False)
# For large files or compressed data
import gzip
with gzip.open('molecules.sdf.gz') as f:
suppl = Chem.ForwardSDMolSupplier(f)
for mol in suppl:
# Process molecule
pass
# Multithreaded processing for large datasets
suppl = Chem.MultithreadedSDMolSupplier('molecules.sdf')
# Write molecules to SDF
writer = Chem.SDWriter('output.sdf')
for mol in molecules:
writer.write(mol)
writer.close()
Important Notes:
- All
MolFrom*functions returnNoneon failure with error messages - Always check for
Nonebefore processing molecules - Molecules are automatically sanitized on import (validates valence, perceives aromaticity)
2. Molecular Sanitization and Validation
RDKit automatically sanitizes molecules during parsing, executing 13 steps including valence checking, aromaticity perception, and chirality assignment.
Sanitization Control:
# Disable automatic sanitization
mol = Chem.MolFromSmiles('C1=CC=CC=C1', sanitize=False)
# Manual sanitization
Chem.SanitizeMol(mol)
# Detect problems before sanitization
problems = Chem.DetectChemistryProblems(mol)
for problem in problems:
print(problem.GetType(), problem.Message())
# Partial sanitization (skip specific steps)
from rdkit.Chem import rdMolStandardize
Chem.SanitizeMol(mol, sanitizeOps=Chem.SANITIZE_ALL ^ Chem.SANITIZE_PROPERTIES)
Common Sanitization Issues:
- Atoms with explicit valence exceeding maximum allowed will raise exceptions
- Invalid aromatic rings will cause kekulization errors
- Radical electrons may not be properly assigned without explicit specification
3. Molecular Analysis and Properties
Accessing Molecular Structure:
# Iterate atoms and bonds
for atom in mol.GetAtoms():
print(atom.GetSymbol(), atom.GetIdx(), atom.GetDegree())
for bond in mol.GetBonds():
print(bond.GetBeginAtomIdx(), bond.GetEndAtomIdx(), bond.GetBondType())
# Ring information
ring_info = mol.GetRingInfo()
ring_info.NumRings()
ring_info.AtomRings() # Returns tuples of atom indices
# Check if atom is in ring
atom = mol.GetAtomWithIdx(0)
atom.IsInRing()
atom.IsInRingSize(6) # Check for 6-membered rings
# Find smallest set of smallest rings (SSSR)
from rdkit.Chem import GetSymmSSSR
rings = GetSymmSSSR(mol)
Stereochemistry:
# Find chiral centers
from rdkit.Chem import FindMolChiralCenters
chiral_centers = FindMolChiralCenters(mol, includeUnassigned=True)
# Returns list of (atom_idx, chirality) tuples
# Assign stereochemistry from 3D coordinates
from rdkit.Chem import AssignStereochemistryFrom3D
AssignStereochemistryFrom3D(mol)
# Check bond stereochemistry
bond = mol.GetBondWithIdx(0)
stereo = bond.GetStereo() # STEREONONE, STEREOZ, STEREOE, etc.
Fragment Analysis:
# Get disconnected fragments
frags = Chem.GetMolFrags(mol, asMols=True)
# Fragment on specific bonds
from rdkit.Chem import FragmentOnBonds
frag_mol = FragmentOnBonds(mol, [bond_idx1, bond_idx2])
# Count ring systems
from rdkit.Chem.Scaffolds import MurckoScaffold
scaffold = MurckoScaffold.GetScaffoldForMol(mol)
4. Molecular Descriptors and Properties
Basic Descriptors:
from rdkit.Chem import Descriptors
# Molecular weight
mw = Descriptors.MolWt(mol)
exact_mw = Descriptors.ExactMolWt(mol)
# LogP (lipophilicity)
logp = Descriptors.MolLogP(mol)
# Topological polar surface area
tpsa = Descriptors.TPSA(mol)
# Number of hydrogen bond donors/acceptors
hbd = Descriptors.NumHDonors(mol)
hba = Descriptors.NumHAcceptors(mol)
# Number of rotatable bonds
rot_bonds = Descriptors.NumRotatableBonds(mol)
# Number of aromatic rings
aromatic_rings = Descriptors.NumAromaticRings(mol)
Batch Descriptor Calculation:
# Calculate all descriptors at once
all_descriptors = Descriptors.CalcMolDescriptors(mol)
# Returns dictionary: {'MolWt': 180.16, 'MolLogP': 1.23, ...}
# Get list of available descriptor names
descriptor_names = [desc[0] for desc in Descriptors._descList]
Lipinski's Rule of Five:
# Check drug-likeness
mw = Descriptors.MolWt(mol) <= 500
logp = Descriptors.MolLogP(mol) <= 5
hbd = Descriptors.NumHDonors(mol) <= 5
hba = Descriptors.NumHAcceptors(mol) <= 10
is_drug_like = mw and logp and hbd and hba
5. Fingerprints and Molecular Similarity
Fingerprint Types:
from rdkit.Chem import rdFingerprintGenerator
from rdkit.Chem import MACCSkeys
# RDKit topological fingerprint
rdk_gen = rdFingerprintGenerator.GetRDKitFPGenerator(minPath=1, maxPath=7, fpSize=2048)
fp = rdk_gen.GetFingerprint(mol)
# Morgan fingerprints (circular fingerprints, similar to ECFP)
# Modern API using rdFingerprintGenerator
morgan_gen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
fp = morgan_gen.GetFingerprint(mol)
# Count-based fingerprint
fp_count = morgan_gen.GetCountFingerprint(mol)
# MACCS keys (166-bit structural key)
fp = MACCSkeys.GenMACCSKeys(mol)
# Atom pair fingerprints
ap_gen = rdFingerprintGenerator.GetAtomPairGenerator()
fp = ap_gen.GetFingerprint(mol)
# Topological torsion fingerprints
tt_gen = rdFingerprintGenerator.GetTopologicalTorsionGenerator()
fp = tt_gen.GetFingerprint(mol)
# Avalon fingerprints (if available)
from rdkit.Avalon import pyAvalonTools
fp = pyAvalonTools.GetAvalonFP(mol)
Similarity Calculation:
from rdkit import DataStructs
from rdkit.Chem import rdFingerprintGenerator
# Generate fingerprints using generator
mfpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
fp1 = mfpgen.GetFingerprint(mol1)
fp2 = mfpgen.GetFingerprint(mol2)
# Calculate Tanimoto similarity
similarity = DataStructs.TanimotoSimilarity(fp1, fp2)
# Calculate similarity for multiple molecules
fps = [mfpgen.GetFingerprint(m) for m in [mol2, mol3, mol4]]
similarities = DataStructs.BulkTanimotoSimilarity(fp1, fps)
# Other similarity metrics
dice = DataStructs.DiceSimilarity(fp1, fp2)
cosine = DataStructs.CosineSimilarity(fp1, fp2)
Clustering and Diversity:
# Butina clustering based on fingerprint similarity
from rdkit.ML.Cluster import Butina
# Calculate distance matrix
dists = []
mfpgen = rdFingerprintGenerator.GetMorganGenerator(radius=2, fpSize=2048)
fps = [mfpgen.GetFingerprint(mol) for mol in mols]
for i in range(len(fps)):
sims = DataStructs.BulkTanimotoSimilarity(fps[i], fps[:i])
dists.extend([1-sim for sim in sims])
# Cluster with distance cutoff
clusters = Butina.ClusterData(dists, len(fps), distThresh=0.3, isDistData=True)
6. Substructure Searching and SMARTS
Basic Substructure Matching:
# Define query using SMARTS
query = Chem.MolFromSmarts('[#6]1:[#6]:[#6]:[#6]:[#6]:[#6]:1') # Benzene ring
# Check if molecule contains substructure
has_match = mol.HasSubstructMatch(query)
# Get all matches (returns tuple of tuples with atom indices)
matches = mol.GetSubstructMatches(query)
# Get only first match
match = mol.GetSubstructMatch(query)
Common SMARTS Patterns:
# Primary alcohols
primary_alcohol = Chem.MolFromSmarts('[CH2][OH1]')
# Carboxylic acids
carboxylic_acid = Chem.MolFromSmarts('C(=O)[OH]')
# Amides
amide = Chem.MolFromSmarts('C(=O)N')
# Aromatic heterocycles
aromatic_n = Chem.MolFromSmarts('[nR]') # Aromatic nitrogen in ring
# Macrocycles (rings > 12 atoms)
macrocycle = Chem.MolFromSmarts('[r{12-}]')
Matching Rules:
- Unspecified properties in query match any value in target
- Hydrogens are ignored unless explicitly specified
- Charged query atom won't match uncharged target atom
- Aromatic query atom won't match aliphatic target atom (unless query is generic)
7. Chemical Reactions
Reaction SMARTS:
from rdkit.Chem import AllChem
# Define reaction using SMARTS: reactants >> products
rxn = AllChem.ReactionFromSmarts('[C:1]=[O:2]>>[C:1][O:2]') # Ketone reduction
# Apply reaction to molecules
reactants = (mol1,)
products = rxn.RunReactants(reactants)
# Products is tuple of tuples (one tuple per product set)
for product_set in products:
for product in product_set:
# Sanitize product
Chem.SanitizeMol(product)
Reaction Features:
- Atom mapping preserves specific atoms between reactants and products
- Dummy atoms in products are replaced by corresponding reactant atoms
- "Any" bonds inherit bond order from reactants
- Chirality preserved unless explicitly changed
Reaction Similarity:
# Generate reaction fingerprints
fp = AllChem.CreateDifferenceFingerprintForReaction(rxn)
# Compare reactions
similarity = DataStructs.TanimotoSimilarity(fp1, fp2)
8. 2D and 3D Coordinate Generation
2D Coordinate Generation:
from rdkit.Chem import AllChem
# Generate 2D coordinates for depiction
AllChem.Compute2DCoords(mol)
# Align molecule to template structure
template = Chem.MolFromSmiles('c1ccccc1')
AllChem.Compute2DCoords(template)
AllChem.GenerateDepictionMatching2DStructure(mol, template)
3D Coordinate Generation and Conformers:
# Generate single 3D conformer using ETKDG
AllChem.EmbedMolecule(mol, randomSeed=42)
# Generate multiple conformers
conf_ids = AllChem.EmbedMultipleConfs(mol, numConfs=10, randomSeed=42)
# Optimize geometry with force field
AllChem.UFFOptimizeMolecule(mol) # UFF force field
AllChem.MMFFOptimizeMolecule(mol) # MMFF94 force field
# Optimize all conformers
for conf_id in conf_ids:
AllChem.MMFFOptimizeMolecule(mol, confId=conf_id)
# Calculate RMSD between conformers
from rdkit.Chem import AllChem
rms = AllChem.GetConformerRMS(mol, conf_id1, conf_id2)
# Align molecules
AllChem.AlignMol(probe_mol, ref_mol)
Constrained Embedding:
# Embed with part of molecule constrained to specific coordinates
AllChem.ConstrainedEmbed(mol, core_mol)
9. Molecular Visualization
Basic Drawing:
from rdkit.Chem import Draw
# Draw single molecule to PIL image
img = Draw.MolToImage(mol, size=(300, 300))
img.save('molecule.png')
# Draw to file directly
Draw.MolToFile(mol, 'molecule.png')
# Draw multiple molecules in grid
mols = [mol1, mol2, mol3, mol4]
img = Draw.MolsToGridImage(mols, molsPerRow=2, subImgSize=(200, 200))
Highlighting Substructures:
# Highlight substructure match
query = Chem.MolFromSmarts('c1ccccc1')
match = mol.GetSubstructMatch(query)
img = Draw.MolToImage(mol, highlightAtoms=match)
# Custom highlight colors
highlight_colors = {atom_idx: (1, 0, 0) for atom_idx in match} # Red
img = Draw.MolToImage(mol, highlightAtoms=match,
highlightAtomColors=highlight_colors)
Customizing Visualization:
from rdkit.Chem.Draw import rdMolDraw2D
# Create drawer with custom options
drawer = rdMolDraw2D.MolDraw2DCairo(300, 300)
opts = drawer.drawOptions()
# Customize options
opts.addAtomIndices = True
opts.addStereoAnnotation = True
opts.bondLineWidth = 2
# Draw molecule
drawer.DrawMolecule(mol)
drawer.FinishDrawing()
# Save to file
with open('molecule.png', 'wb') as f:
f.write(drawer.GetDrawingText())
Jupyter Notebook Integration:
# Enable inline display in Jupyter
from rdkit.Chem.Draw import IPythonConsole
# Customize default display
IPythonConsole.ipython_useSVG = True # Use SVG instead of PNG
IPythonConsole.molSize = (300, 300) # Default size
# Molecules now display automatically
mol # Shows molecule image
Visualizing Fingerprint Bits:
# Show what molecular features a fingerprint bit represents
from rdkit.Chem import Draw
# For Morgan fingerprints
bit_info = {}
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, bitInfo=bit_info)
# Draw environment for specific bit
img = Draw.DrawMorganBit(mol, bit_id, bit_info)
10. Molecular Modification
Adding/Removing Hydrogens:
# Add explicit hydrogens
mol_h = Chem.AddHs(mol)
# Remove explicit hydrogens
mol = Chem.RemoveHs(mol_h)
Kekulization and Aromaticity:
# Convert aromatic bonds to alternating single/double
Chem.Kekulize(mol)
# Set aromaticity
Chem.SetAromaticity(mol)
Replacing Substructures:
# Replace substructure with another structure
query = Chem.MolFromSmarts('c1ccccc1') # Benzene
replacement = Chem.MolFromSmiles('C1CCCCC1') # Cyclohexane
new_mol = Chem.ReplaceSubstructs(mol, query, replacement)[0]
Neutralizing Charges:
# Remove formal charges by adding/removing hydrogens
from rdkit.Chem.MolStandardize import rdMolStandardize
# Using Uncharger
uncharger = rdMolStandardize.Uncharger()
mol_neutral = uncharger.uncharge(mol)
11. Working with Molecular Hashes and Standardization
Molecular Hashing:
from rdkit.Chem import rdMolHash
# Generate Murcko scaffold hash
scaffold_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.MurckoScaffold)
# Canonical SMILES hash
canonical_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.CanonicalSmiles)
# Regioisomer hash (ignores stereochemistry)
regio_hash = rdMolHash.MolHash(mol, rdMolHash.HashFunction.Regioisomer)
Randomized SMILES:
# Generate random SMILES representations (for data augmentation)
from rdkit.Chem import MolToRandomSmilesVect
random_smiles = MolToRandomSmilesVect(mol, numSmiles=10, randomSeed=42)
12. Pharmacophore and 3D Features
Pharmacophore Features:
from rdkit.Chem import ChemicalFeatures
from rdkit import RDConfig
import os
# Load feature factory
fdef_path = os.path.join(RDConfig.RDDataDir, 'BaseFeatures.fdef')
factory = ChemicalFeatures.BuildFeatureFactory(fdef_path)
# Get pharmacophore features
features = factory.GetFeaturesForMol(mol)
for feat in features:
print(feat.GetFamily(), feat.GetType(), feat.GetAtomIds())
Common Workflows
Drug-likeness Analysis
from rdkit import Chem
from rdkit.Chem import Descriptors
def analyze_druglikeness(smiles):
mol = Chem.MolFromSmiles(smiles)
if mol is None:
return None
# Calculate Lipinski descriptors
results = {
'MW': Descriptors.MolWt(mol),
'LogP': Descriptors.MolLogP(mol),
'HBD': Descriptors.NumHDonors(mol),
'HBA': Descriptors.NumHAcceptors(mol),
'TPSA': Descriptors.TPSA(mol),
'RotBonds': Descriptors.NumRotatableBonds(mol)
}
# Check Lipinski's Rule of Five
results['Lipinski'] = (
results['MW'] <= 500 and
results['LogP'] <= 5 and
results['HBD'] <= 5 and
results['HBA'] <= 10
)
return results
Similarity Screening
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit import DataStructs
def similarity_screen(query_smiles, database_smiles, threshold=0.7):
query_mol = Chem.MolFromSmiles(query_smiles)
query_fp = AllChem.GetMorganFingerprintAsBitVect(query_mol, 2)
hits = []
for idx, smiles in enumerate(database_smiles):
mol = Chem.MolFromSmiles(smiles)
if mol:
fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2)
sim = DataStructs.TanimotoSimilarity(query_fp, fp)
if sim >= threshold:
hits.append((idx, smiles, sim))
return sorted(hits, key=lambda x: x[2], reverse=True)
Substructure Filtering
from rdkit import Chem
def filter_by_substructure(smiles_list, pattern_smarts):
query = Chem.MolFromSmarts(pattern_smarts)
hits = []
for smiles in smiles_list:
mol = Chem.MolFromSmiles(smiles)
if mol and mol.HasSubstructMatch(query):
hits.append(smiles)
return hits
Best Practices
Error Handling
Always check for None when parsing molecules:
mol = Chem.MolFromSmiles(smiles)
if mol is None:
print(f"Failed to parse: {smiles}")
continue
Performance Optimization
Use binary formats for storage:
import pickle
# Pickle molecules for fast loading
with open('molecules.pkl', 'wb') as f:
pickle.dump(mols, f)
# Load pickled molecules (much faster than reparsing)
with open('molecules.pkl', 'rb') as f:
mols = pickle.load(f)
Use bulk operations:
# Calculate fingerprints for all molecules at once
fps = [AllChem.GetMorganFingerprintAsBitVect(mol, 2) for mol in mols]
# Use bulk similarity calculations
similarities = DataStructs.BulkTanimotoSimilarity(fps[0], fps[1:])
Thread Safety
RDKit operations are generally thread-safe for:
- Molecule I/O (SMILES, mol blocks)
- Coordinate generation
- Fingerprinting and descriptors
- Substructure searching
- Reactions
- Drawing
Not thread-safe: MolSuppliers when accessed concurrently.
Memory Management
For large datasets:
# Use ForwardSDMolSupplier to avoid loading entire file
with open('large.sdf') as f:
suppl = Chem.ForwardSDMolSupplier(f)
for mol in suppl:
# Process one molecule at a time
pass
# Use MultithreadedSDMolSupplier for parallel processing
suppl = Chem.MultithreadedSDMolSupplier('large.sdf', numWriterThreads=4)
Common Pitfalls
- Forgetting to check for None: Always validate molecules after parsing
- Sanitization failures: Use
DetectChemistryProblems()to debug - Missing hydrogens: Use
AddHs()when calculating properties that depend on hydrogen - 2D vs 3D: Generate appropriate coordinates before visualization or 3D analysis
- SMARTS matching rules: Remember that unspecified properties match anything
- Thread safety with MolSuppliers: Don't share supplier objects across threads
Resources
references/
This skill includes detailed API reference documentation:
api_reference.md- Comprehensive listing of RDKit modules, functions, and classes organized by functionalitydescriptors_reference.md- Complete list of available molecular descriptors with descriptionssmarts_patterns.md- Common SMARTS patterns for functional groups and structural features
Load these references when needing specific API details, parameter information, or pattern examples.
scripts/
Example scripts for common RDKit workflows:
molecular_properties.py- Calculate comprehensive molecular properties and descriptorssimilarity_search.py- Perform fingerprint-based similarity screeningsubstructure_filter.py- Filter molecules by substructure patterns
These scripts can be executed directly or used as templates for custom workflows.
GitHub 저장소
연관 스킬
content-collections
메타이 스킬은 콘텐츠 콜렉션(Content Collections)을 위한 프로덕션 검증된 설정을 제공합니다. 콘텐츠 콜렉션은 Markdown/MDX 파일을 Zod 검증이 포함된 타입 안전한 데이터 콜렉션으로 변환해주는 TypeScript 최우선 도구입니다. 블로그, 문서 사이트 또는 콘텐츠 중심의 Vite + React 애플리케이션을 구축할 때 타입 안전성과 자동 콘텐츠 검증을 보장하기 위해 사용하세요. Vite 플러그인 구성과 MDX 컴파일부터 배포 최적화 및 스키마 검증에 이르기까지 모든 것을 다룹니다.
polymarket
메타이 스킬은 개발자들이 Polymarket 예측 시장 플랫폼을 활용한 애플리케이션을 구축할 수 있도록 지원하며, 거래 및 시장 데이터를 위한 API 통합 기능을 포함합니다. 또한 WebSocket을 통한 실시간 데이터 스트리밍을 제공하여 실시간 거래와 시장 활동을 모니터링할 수 있습니다. 이를 통해 거래 전략을 구현하거나 실시간 시장 업데이트를 처리하는 도구를 생성하는 데 활용할 수 있습니다.
creating-opencode-plugins
메타이 스킬은 개발자들이 명령어, 파일, LSP 작업 등 25개 이상의 이벤트 유형에 연결되는 OpenCode 플러그인을 만들 수 있도록 돕습니다. JavaScript/TypeScript 모듈을 위한 플러그인 구조, 이벤트 API 명세, 구현 패턴을 제공합니다. OpenCode AI 어시스턴트의 라이프사이클을 사용자 정의 이벤트 기반 로직으로 가로채거나, 모니터링하거나, 확장해야 할 때 사용하세요.
sglang
메타SGLang은 RadixAttention 프리픽스 캐싱을 활용하여 JSON, 정규식, 에이전트 워크플로우를 위한 고속 구조화 생성에 특화된 고성능 LLM 서빙 프레임워크입니다. 특히 반복되는 프리픽스가 있는 작업에서 상당히 빠른 추론 속도를 제공하여 복잡한 구조화 출력 및 다중 턴 대화에 이상적입니다. 제약 디코딩이 필요하거나 광범위한 프리픽스 공유가 있는 애플리케이션을 구축할 때는 vLLM과 같은 대안보다 SGLang을 선택하십시오.
