bio-proteomics-data-import
About
This skill imports and parses mass spectrometry data formats like mzML/mzXML and MaxQuant outputs for proteomics analysis. It handles initial data loading, contaminant filtering, and missing value assessment using pyOpenMS and pandas. Use it when starting to work with raw or processed proteomics data to prepare datasets for downstream analysis.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/bio-proteomics-data-importCopy and paste this command in Claude Code to install this skill
Documentation
Mass Spectrometry Data Import
Loading mzML/mzXML Files with pyOpenMS
from pyopenms import MSExperiment, MzMLFile, MzXMLFile
exp = MSExperiment()
MzMLFile().load('sample.mzML', exp)
for spectrum in exp:
if spectrum.getMSLevel() == 1:
mz, intensity = spectrum.get_peaks()
elif spectrum.getMSLevel() == 2:
precursor = spectrum.getPrecursors()[0]
precursor_mz = precursor.getMZ()
Loading MaxQuant Output
import pandas as pd
protein_groups = pd.read_csv('proteinGroups.txt', sep='\t', low_memory=False)
# Filter contaminants and reverse hits
contam_col = 'Potential contaminant' if 'Potential contaminant' in protein_groups.columns else 'Contaminant'
protein_groups = protein_groups[
(protein_groups.get(contam_col, '') != '+') &
(protein_groups.get('Reverse', '') != '+') &
(protein_groups.get('Only identified by site', '') != '+')
]
# Extract intensity columns (LFQ or iBAQ)
intensity_cols = [c for c in protein_groups.columns if c.startswith('LFQ intensity') or c.startswith('iBAQ ')]
if not intensity_cols:
intensity_cols = [c for c in protein_groups.columns if c.startswith('Intensity ') and 'Intensity L' not in c]
intensities = protein_groups[['Protein IDs', 'Gene names'] + intensity_cols]
Loading Spectronaut/DIA-NN Output
diann_report = pd.read_csv('report.tsv', sep='\t')
# Pivot to protein-level matrix
protein_matrix = diann_report.pivot_table(
index='Protein.Group', columns='Run', values='PG.MaxLFQ', aggfunc='first'
)
R: Loading with MSnbase
library(MSnbase)
raw_data <- readMSData('sample.mzML', mode = 'onDisk')
spectra <- spectra(raw_data)
header_info <- fData(raw_data)
Missing Value Assessment
def assess_missing_values(df, intensity_cols):
missing_per_protein = df[intensity_cols].isna().sum(axis=1)
missing_per_sample = df[intensity_cols].isna().sum(axis=0)
total_missing = df[intensity_cols].isna().sum().sum()
total_values = df[intensity_cols].size
missing_pct = 100 * total_missing / total_values
return {'per_protein': missing_per_protein, 'per_sample': missing_per_sample, 'total_pct': missing_pct}
Related Skills
- quantification - Process imported data for quantification
- peptide-identification - Identify peptides from raw spectra
- expression-matrix/counts-ingest - Similar data loading patterns
GitHub Repository
Related Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
llamaindex
MetaLlamaIndex is a data framework for building RAG-powered LLM applications, specializing in document ingestion, indexing, and querying. It provides key features like vector indices, query engines, and agents, and supports over 300 data connectors. Use it for document Q&A, chatbots, and knowledge retrieval when building data-centric applications.
hybrid-cloud-networking
MetaThis skill configures secure hybrid cloud networking between on-premises infrastructure and cloud platforms like AWS, Azure, and GCP. Use it when connecting data centers to the cloud, building hybrid architectures, or implementing secure cross-premises connectivity. It supports key capabilities such as VPNs and dedicated connections like AWS Direct Connect for high-performance, reliable setups.
polymarket
MetaThis skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.
