medchem
정보
메드켐 스킬은 신약 개발 과정에서 화합물 선별을 위한 의약화학 필터를 제공합니다. 이 스킬을 통해 개발자는 약물 유사성 규칙, 구조적 경고 카탈로그, 복잡성 지표를 적용하여 대규모 분자 라이브러리를 우선순위화할 수 있습니다. 리핀스키 규칙, PAINS 경고, 그리고 맞춤형 메드켐 쿼리 언어와 같은 확립된 지침을 사용하여 화합물을 필터링하는 데 활용하세요.
빠른 설치
Claude Code
추천npx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/medchemClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
Medchem
Overview
Medchem is a Python library from datamol-io for molecular filtering and prioritization in drug discovery. Apply literature-derived drug-likeness rules, named alert catalogs, complexity thresholds, chemical-group detection, and a custom query language to triage compound libraries at scale. Filters are context-specific guidelines — combine with domain expertise and target knowledge.
Version note: Examples target medchem 2.0.5 (PyPI stable, Nov 2024). Requires Python ≥3.9. Depends on datamol and RDKit (installed automatically). RuleFilters and structural filter classes return pandas DataFrames. Lilly demerits require optional native binaries (mamba install lilly-medchem-rules).
When to Use This Skill
This skill should be used when:
- Applying drug-likeness rules (Lipinski, Veber, CNS, lead-like) to compound libraries
- Filtering molecules by structural alerts, PAINS, or NIBR screening-deck rules
- Prioritizing compounds for hit-to-lead or lead optimization
- Calculating complexity metrics against ZINC-derived thresholds
- Detecting functional groups or named substructure catalogs
- Building multi-criteria filters with the medchem query language
Installation
uv pip install medchem datamol
Optional — Eli Lilly demerit filter (requires conda-forge native binaries):
mamba install -c conda-forge lilly-medchem-rules
Core Capabilities
1. Medicinal Chemistry Rules
Apply established drug-likeness rules via medchem.rules.
List available rules:
import medchem as mc
mc.rules.RuleFilters.list_available_rules_names()
# ['rule_of_five', 'rule_of_five_beyond', 'rule_of_four', 'rule_of_three', ...]
Single rule on one molecule:
import datamol as dm
import medchem as mc
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # aspirin
mc.rules.basic_rules.rule_of_five(smiles) # True
mc.rules.basic_rules.rule_of_cns(smiles) # True
mc.rules.basic_rules.rule_of_veber(smiles) # True
Multiple rules with RuleFilters (returns a DataFrame):
import datamol as dm
import medchem as mc
mols = [dm.to_mol(s) for s in smiles_list]
rfilter = mc.rules.RuleFilters(
rule_list=["rule_of_five", "rule_of_oprea", "rule_of_cns", "rule_of_leadlike_soft"]
)
df = rfilter(mols=mols, n_jobs=-1, progress=True, keep_props=False)
# Columns: mol, pass_all, pass_any, rule_of_five, rule_of_oprea, ...
passing = df[df["pass_all"]]
Use keep_props=True to include computed descriptors (mw, clogp, tpsa, etc.) in the result.
2. Structural Alert Filters
Detect problematic patterns with medchem.structural. Both classes return DataFrames with pass_filter, status, and reasons columns.
Common alerts (ChEMBL-derived rule sets):
import medchem as mc
alert_filter = mc.structural.CommonAlertsFilters()
df = alert_filter(mols=mol_list, n_jobs=-1, progress=True)
# df columns: mol, pass_filter, status, reasons
clean = df[df["pass_filter"]]
NIBR filters (Novartis screening-deck curation):
nibr_filter = mc.structural.NIBRFilters()
df = nibr_filter(mols=mol_list, n_jobs=-1, progress=True)
# df columns: mol, pass_filter, status, severity, reasons, n_covalent_motif, special_mol
Compounds with severity >= 10 are excluded by default (see NIBR paper).
3. Named Catalog Filters (PAINS, Brenk, etc.)
Use medchem.catalogs.NamedCatalogs for RDKit FilterCatalog instances, or the functional API:
import medchem as mc
# List available named catalogs
mc.catalogs.list_named_catalogs()
# ['tox', 'pains', 'pains_a', 'brenk', 'nibr', 'zinc', ...]
# Functional API — True means molecule passes (no alert match)
passes = mc.functional.alert_filter(mols=mol_list, alerts=["pains"], n_jobs=-1)
# Or via catalog objects
passes = mc.functional.catalog_filter(
mols=mol_list,
catalogs=[mc.catalogs.NamedCatalogs.pains()],
n_jobs=-1,
)
4. Functional API
medchem.functional provides one-call wrappers that return boolean masks (True = passes):
import medchem as mc
mc.functional.rules_filter(mols=mol_list, rules=["rule_of_five", "rule_of_cns"], n_jobs=-1)
mc.functional.nibr_filter(mols=mol_list, max_severity=10, n_jobs=-1)
mc.functional.alert_filter(mols=mol_list, alerts=["pains", "brenk"], n_jobs=-1)
mc.functional.complexity_filter(mols=mol_list, complexity_metric="bertz", limit="99", n_jobs=-1)
Other helpers: catalog_filter, chemical_group_filter, lilly_demerit_filter (requires optional binaries), macrocycle_filter, bredt_filter, protecting_groups_filter, and more.
5. Chemical Groups
Detect functional groups and curated pattern collections via medchem.groups:
import medchem as mc
# Browse available group collections
mc.groups.list_default_chemical_groups()
# ['privileged_scaffolds', 'common_warhead_covalent_inhibitors', 'rings_in_drugs', ...]
group = mc.groups.ChemicalGroup(groups=["privileged_scaffolds"])
group.has_match(mol) # bool
group.get_matches(mol) # dict of group → atom indices
group.filter(mols) # molecules matching the group
# Returns molecules that do NOT match the group
mc.functional.chemical_group_filter(mols=mol_list, chemical_group=group, n_jobs=-1)
Custom groups can be loaded from a file via groups_db (CSV with smiles/smarts, name, group columns).
6. Molecular Complexity
Compare complexity metrics to precomputed ZINC-15 percentile thresholds:
import medchem as mc
# Single molecule
cf = mc.complexity.ComplexityFilter(limit="99", complexity_metric="bertz")
cf(mol) # True if below 99th-percentile threshold
# Batch via functional API
mc.functional.complexity_filter(
mols=mol_list,
complexity_metric="bertz", # also: sas, qed, whitlock, barone, smcm, twc
limit="99",
n_jobs=-1,
)
# Direct metric functions
mc.complexity.WhitlockCT(mol)
mc.complexity.BaroneCT(mol)
7. Scaffold Constraints
medchem.constraints.Constraints matches a core scaffold and applies per-atom constraint functions — not simple MW/LogP ranges. For property bounds, use RuleFilters, descriptors via mc.rules.list_descriptors(), or the query language.
import datamol as dm
import medchem as mc
core = dm.to_mol("c1ccccc1")
constraints = mc.constraints.Constraints(
core=core,
constraint_fns={"query": lambda mol, atom_idx, query: ...},
)
constraints(mol)
8. Medchem Query Language
Build multi-criteria filters with medchem.query.QueryFilter:
import medchem as mc
# Rule + alert combination
qf = mc.query.QueryFilter('MATCHRULE("rule_of_five") AND NOT HASALERT("pains")')
mask = qf(mols=mol_list, n_jobs=-1) # list[bool]
# CNS-like with property bounds
qf = mc.query.QueryFilter('MATCHRULE("rule_of_cns") AND HASPROP("tpsa", <=, 90)')
mask = qf(mols=mol_list, n_jobs=-1)
Query syntax:
MATCHRULE("rule_of_five")— apply a named ruleHASALERT("pains")— match a named catalog (pains,brenk,nibr,tox, …)HASPROP("mw", <, 500)— compare a descriptor (unquoted comparator)HASGROUP("privileged_scaffolds")— match a chemical groupHASSUBSTRUCTURE("c1ccccc1")— substructure match- Operators:
AND,OR,NOT
List available descriptors: mc.rules.list_descriptors()
Workflow Patterns
Pattern 1: Initial Triage of a Compound Library
import datamol as dm
import medchem as mc
import pandas as pd
df = pd.read_csv("compounds.csv")
mols = [dm.to_mol(s) for s in df["smiles"]]
# Drug-likeness rules
rules_df = mc.rules.RuleFilters(rule_list=["rule_of_five", "rule_of_veber"])(mols=mols, n_jobs=-1)
# PAINS + common alerts via query
qf = mc.query.QueryFilter('MATCHRULE("rule_of_five") AND NOT HASALERT("pains")')
pass_mask = qf(mols=mols, n_jobs=-1)
df["passes_rules"] = rules_df["pass_all"].values
df["drug_like"] = pass_mask
filtered_df = df[df["drug_like"]]
filtered_df.to_csv("filtered_compounds.csv", index=False)
Pattern 2: Lead Optimization Filtering
import medchem as mc
rules_df = mc.rules.RuleFilters(rule_list=["rule_of_leadlike_soft"])(mols=candidates, n_jobs=-1)
nibr_df = mc.structural.NIBRFilters()(mols=candidates, n_jobs=-1)
complex_mask = mc.functional.complexity_filter(
mols=candidates, complexity_metric="bertz", limit="95", n_jobs=-1
)
passes = (
rules_df["pass_all"]
& nibr_df["pass_filter"]
& complex_mask
)
Pattern 3: Detect Functional Groups
import medchem as mc
group = mc.groups.ChemicalGroup(groups=["common_warhead_covalent_inhibitors"])
matches = [group.has_match(mol) for mol in mol_list]
warhead_mols = [mol for mol, m in zip(mol_list, matches) if m]
Best Practices
- Context matters — marketed drugs often violate Ro5; prodrugs and natural products are common exceptions.
- Combine filters — rules, alert catalogs, and complexity thresholds work best together.
- Use parallelization — pass
n_jobs=-1for libraries >1000 molecules. - Check return types —
RuleFiltersand structural classes return DataFrames; functional helpers return boolean arrays. - Lilly demerits are optional — install
lilly-medchem-rulesseparately; default max demerits is 160 in the functional API. - Document decisions — retain
status,reasons, andseveritycolumns for audit trails.
Resources
references/api_guide.md
Module-by-module API reference with signatures, return types, and patterns.
references/rules_catalog.md
Catalog of available rules, alert sets, complexity metrics, and filter selection guidelines.
scripts/filter_molecules.py
Batch filtering script for CSV/TSV/SDF/SMILES inputs with configurable rules, alerts, and complexity thresholds.
uv run python scripts/filter_molecules.py input.csv \
--rules rule_of_five,rule_of_cns --pains --nibr --output filtered.csv
Documentation
- Official docs: https://medchem-docs.datamol.io/
- GitHub: https://github.com/datamol-io/medchem
- PyPI: https://pypi.org/project/medchem/ (2.0.5)
GitHub 저장소
연관 스킬
llamaguard
기타LlamaGuard는 폭력 및 혐오 발언 등 6가지 안전 범주에서 LLM 입력과 출력을 조정하기 위한 Meta의 70-80억 파라미터 모델입니다. 94-95% 정확도를 제공하며 vLLM, Hugging Face 또는 Amazon SageMaker를 사용해 배포할 수 있습니다. 이 기술을 사용하여 AI 애플리케이션에 콘텐츠 필터링 및 안전 가드레일을 손쉽게 통합하세요.
cost-optimization
기타이 Claude Skill은 리소스 적정화, 태깅 전략, 지출 분석을 통해 개발자들이 클라우드 비용을 최적화할 수 있도록 지원합니다. AWS, Azure, GCP에서 클라우드 비용을 절감하고 비용 거버넌스를 구현하기 위한 프레임워크를 제공합니다. 인프라 비용을 분석하거나, 리소스를 적정화하거나, 예산 제약을 충족해야 할 때 사용하세요.
quantizing-models-bitsandbytes
기타이 스킬은 bitsandbytes를 사용하여 LLM을 8비트 또는 4비트 정밀도로 양자화하며, 최소한의 정확도 손실로 50-75%의 메모리 감소를 달성합니다. 제한된 GPU 메모리에서 더 큰 모델을 실행하거나 추론을 가속화하는 데 이상적이며, INT8, NF4, FP4와 같은 형식을 지원합니다. 이 스킬은 HuggingFace Transformers와 통합되어 QLoRA 학습 및 8비트 옵티마이저를 가능하게 합니다.
dispatching-parallel-agents
기타이 Claude Skill은 3개 이상의 독립적인 문제를 동시에 조사하고 해결하기 위해 다중 에이전트를 배치합니다. 공유 상태나 의존성 없이 해결 가능한 무관련 장애 시나리오에 맞게 설계되었습니다. 핵심 기능은 병렬 문제 해결로, 각 독립 문제 영역마다 하나의 에이전트를 할당하여 효율성을 극대화합니다.
