bioservices
정보
BioServices는 UniProt와 KEGG를 포함한 40개 이상의 생물정보학 데이터베이스를 일관된 API로 조회할 수 있는 통합 Python 인터페이스를 제공합니다. 여러 서비스에 걸친 데이터베이스 간 분석과 ID 매핑이 필요한 복잡한 워크플로우에 가장 적합합니다. 더 간단한 단일 데이터베이스 조회의 경우 gget이나 biopython 같은 대안을 고려해 보세요.
빠른 설치
Claude Code
추천npx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/bioservicesClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
BioServices
Overview
BioServices is a Python package providing programmatic access to approximately 40 bioinformatics web services and databases. Retrieve biological data, perform cross-database queries, map identifiers, analyze sequences, and integrate multiple biological resources in Python workflows. The package handles both REST and SOAP/WSDL protocols transparently.
Version note: Examples target bioservices 1.16.0 (PyPI, Mar 2026). Requires Python 3.9–3.12. UniProt REST changes in mid-2022 (bioservices ≥1.10) mainly affect tabular columns names — see upstream _legacy_names if parsing breaks. ChEMBL wrappers changed at 1.6.0 (2018 API); use get_similarity, get_substructure, get_molecule instead of pre-1.6 method names.
When to Use This Skill
This skill should be used when:
- Retrieving protein sequences, annotations, or structures from UniProt, PDB, Pfam
- Analyzing metabolic pathways and gene functions via KEGG or Reactome
- Searching compound databases (ChEBI, ChEMBL, PubChem) for chemical information
- Converting identifiers between different biological databases (KEGG↔UniProt, compound IDs)
- Running sequence similarity searches (BLAST, MUSCLE alignment)
- Querying gene ontology terms (QuickGO, GO annotations)
- Accessing protein-protein interaction data (PSICQUIC, IntactComplex)
- Mining genomic data (BioMart, ArrayExpress, ENA)
- Integrating data from multiple bioinformatics resources in a single workflow
Core Capabilities
1. Protein Analysis
Retrieve protein information, sequences, and functional annotations:
from bioservices import UniProt
u = UniProt(verbose=False)
# Search for protein by name
results = u.search("ZAP70_HUMAN", frmt="tab", columns="id,genes,organism")
# Retrieve FASTA sequence
sequence = u.retrieve("P43403", "fasta")
# Map identifiers between databases
kegg_ids = u.mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")
Key methods:
search(): Query UniProt with flexible search termsretrieve(): Get protein entries in various formats (FASTA, XML, tab)mapping(): Convert identifiers between databases
Reference: references/services_reference.md for complete UniProt API details.
2. Pathway Discovery and Analysis
Access KEGG pathway information for genes and organisms:
from bioservices import KEGG
k = KEGG()
k.organism = "hsa" # Set to human
# Search for organisms
k.lookfor_organism("droso") # Find Drosophila species
# Find pathways by name
k.lookfor_pathway("B cell") # Returns matching pathway IDs
# Get pathways containing specific genes
pathways = k.get_pathway_by_gene("7535", "hsa") # ZAP70 gene
# Retrieve and parse pathway data
data = k.get("hsa04660")
parsed = k.parse(data)
# Extract pathway interactions
interactions = k.parse_kgml_pathway("hsa04660")
relations = interactions['relations'] # Protein-protein interactions
# Convert to Simple Interaction Format
sif_data = k.pathway2sif("hsa04660")
Key methods:
lookfor_organism(),lookfor_pathway(): Search by nameget_pathway_by_gene(): Find pathways containing genesparse_kgml_pathway(): Extract structured pathway datapathway2sif(): Get protein interaction networks
Reference: references/workflow_patterns.md for complete pathway analysis workflows.
3. Compound Database Searches
Search and cross-reference compounds across multiple databases:
from bioservices import KEGG, UniChem
k = KEGG()
# Search compounds by name
results = k.find("compound", "Geldanamycin") # Returns cpd:C11222
# Get compound information with database links
compound_info = k.get("cpd:C11222") # Includes ChEBI links
# Cross-reference KEGG → ChEMBL using UniChem
u = UniChem()
chembl_id = u.get_compound_id_from_kegg("C11222") # Returns CHEMBL278315
Common workflow:
- Search compound by name in KEGG
- Extract KEGG compound ID
- Use UniChem for KEGG → ChEMBL mapping
- ChEBI IDs are often provided in KEGG entries
Reference: references/identifier_mapping.md for complete cross-database mapping guide.
4. Sequence Analysis
Run BLAST searches and sequence alignments. NCBI requires a contact email — prefer the NCBI_EMAIL environment variable (same convention as BioPython Entrez and other repo skills):
import os
from bioservices import NCBIblast
s = NCBIblast(verbose=False)
email = os.environ["NCBI_EMAIL"] # set before running: export [email protected]
# Run BLASTP against UniProtKB
jobid = s.run(
program="blastp",
sequence=protein_sequence,
stype="protein",
database="uniprotkb",
email=email,
)
# Check job status and retrieve results
s.getStatus(jobid)
results = s.getResult(jobid, "out")
Note: BLAST jobs are asynchronous. Check status before retrieving results.
5. Identifier Mapping
Convert identifiers between different biological databases:
from bioservices import UniProt, KEGG
# UniProt mapping (many database pairs supported)
u = UniProt()
results = u.mapping(
fr="UniProtKB_AC-ID", # Source database
to="KEGG", # Target database
query="P43403" # Identifier(s) to convert
)
# KEGG gene ID → UniProt
kegg_to_uniprot = u.mapping(fr="KEGG", to="UniProtKB_AC-ID", query="hsa:7535")
# For compounds, use UniChem
from bioservices import UniChem
u = UniChem()
chembl_from_kegg = u.get_compound_id_from_kegg("C11222")
Supported mappings (UniProt):
- UniProtKB ↔ KEGG
- UniProtKB ↔ Ensembl
- UniProtKB ↔ PDB
- UniProtKB ↔ RefSeq
- And many more (see
references/identifier_mapping.md)
6. Gene Ontology Queries
Access GO terms and annotations:
from bioservices import QuickGO
g = QuickGO(verbose=False)
# Retrieve GO term information
term_info = g.Term("GO:0003824", frmt="obo")
# Search annotations
annotations = g.Annotation(protein="P43403", format="tsv")
7. Protein-Protein Interactions
Query interaction databases via PSICQUIC:
from bioservices import PSICQUIC
s = PSICQUIC(verbose=False)
# Query specific database (e.g., MINT)
interactions = s.query("mint", "ZAP70 AND species:9606")
# List available interaction databases
databases = s.activeDBs
Available databases: MINT, IntAct, BioGRID, DIP, and 30+ others.
Multi-Service Integration Workflows
BioServices excels at combining multiple services for comprehensive analysis. Common integration patterns:
Complete Protein Analysis Pipeline
Execute a full protein characterization workflow:
export [email protected]
python scripts/protein_analysis_workflow.py ZAP70_HUMAN
# Or pass email as optional second argument if NCBI_EMAIL is unset
python scripts/protein_analysis_workflow.py ZAP70_HUMAN [email protected]
This script demonstrates:
- UniProt search for protein entry
- FASTA sequence retrieval
- BLAST similarity search
- KEGG pathway discovery
- PSICQUIC interaction mapping
Pathway Network Analysis
Analyze all pathways for an organism:
python scripts/pathway_analysis.py hsa output_directory/
Extracts and analyzes:
- All pathway IDs for organism
- Protein-protein interactions per pathway
- Interaction type distributions
- Exports to CSV/SIF formats
Cross-Database Compound Search
Map compound identifiers across databases:
python scripts/compound_cross_reference.py Geldanamycin
Retrieves:
- KEGG compound ID
- ChEBI identifier
- ChEMBL identifier
- Basic compound properties
Batch Identifier Conversion
Convert multiple identifiers at once:
python scripts/batch_id_converter.py input_ids.txt --from UniProtKB_AC-ID --to KEGG
Best Practices
Output Format Handling
Different services return data in various formats:
- XML: Parse using BeautifulSoup (most SOAP services)
- Tab-separated (TSV): Pandas DataFrames for tabular data
- Dictionary/JSON: Direct Python manipulation
- FASTA: BioPython integration for sequence analysis
Rate Limiting and Verbosity
Control API request behavior:
from bioservices import KEGG
k = KEGG(verbose=False) # Suppress HTTP request details
k.TIMEOUT = 30 # Adjust timeout for slow connections
Error Handling
Wrap service calls in try-except blocks:
try:
results = u.search("ambiguous_query")
if results:
# Process results
pass
except Exception as e:
print(f"Search failed: {e}")
Organism Codes
Use standard organism abbreviations:
hsa: Homo sapiens (human)mmu: Mus musculus (mouse)dme: Drosophila melanogastersce: Saccharomyces cerevisiae (yeast)
List all organisms: k.list("organism") or k.organismIds
Integration with Other Tools
BioServices works well with:
- BioPython: Sequence analysis on retrieved FASTA data
- Pandas: Tabular data manipulation
- PyMOL: 3D structure visualization (retrieve PDB IDs)
- NetworkX: Network analysis of pathway interactions
- Galaxy: Custom tool wrappers for workflow platforms
Resources
scripts/
Executable Python scripts demonstrating complete workflows:
protein_analysis_workflow.py: End-to-end protein characterizationpathway_analysis.py: KEGG pathway discovery and network extractioncompound_cross_reference.py: Multi-database compound searchingbatch_id_converter.py: Bulk identifier mapping utility
Scripts can be executed directly or adapted for specific use cases.
references/
Detailed documentation loaded as needed:
services_reference.md: Comprehensive list of all 40+ services with methodsworkflow_patterns.md: Detailed multi-step analysis workflowsidentifier_mapping.md: Complete guide to cross-database ID conversion
Load references when working with specific services or complex integration tasks.
Installation
uv pip install "bioservices==1.16.0"
Dependencies are installed automatically. Upstream CI tests Python 3.9–3.12 (PyPI, docs).
Credentials
Most services need no API key. Exceptions:
| Service | Requirement |
|---|---|
| NCBI BLAST | Contact email via NCBI_EMAIL or email= in NCBIblast.run() |
| Some EBI services | Optional; check service docs if rate-limited |
Set once per shell session:
export [email protected]
Use a real institutional or lab address — NCBI may contact you about heavy BLAST usage.
Additional Information
For detailed API documentation and advanced features, refer to:
- Official documentation: https://bioservices.readthedocs.io/
- Source code: https://github.com/cokelaer/bioservices
- Service-specific references in
references/services_reference.md
GitHub 저장소
연관 스킬
executing-plans
디자인executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.
requesting-code-review
디자인이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.
connect-mcp-server
디자인이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.
web-cli-teleport
디자인이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.
