bio-proteomics-data-import

majiayu000

更新日 2 days ago

12 閲覧

デザインdata

について

このスキルは、プロテオミクス解析のためにmzML/mzXML形式やMaxQuant出力などの質量分析データ形式をインポートおよび解析します。pyOpenMSとpandasを使用して、初期データの読み込み、汚染物質のフィルタリング、欠損値の評価を処理します。下流解析用にデータセットを準備するために、未処理または処理済みのプロテオミクスデータを扱い始める際にご利用ください。

クイックインストール

Claude Code

推奨

プラグインコマンド推奨

/plugin add https://github.com/majiayu000/claude-skill-registry

Git クローン代替

git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/bio-proteomics-data-import

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

ドキュメント

Mass Spectrometry Data Import

Loading mzML/mzXML Files with pyOpenMS

from pyopenms import MSExperiment, MzMLFile, MzXMLFile

exp = MSExperiment()
MzMLFile().load('sample.mzML', exp)

for spectrum in exp:
    if spectrum.getMSLevel() == 1:
        mz, intensity = spectrum.get_peaks()
    elif spectrum.getMSLevel() == 2:
        precursor = spectrum.getPrecursors()[0]
        precursor_mz = precursor.getMZ()

Loading MaxQuant Output

import pandas as pd

protein_groups = pd.read_csv('proteinGroups.txt', sep='\t', low_memory=False)

# Filter contaminants and reverse hits
contam_col = 'Potential contaminant' if 'Potential contaminant' in protein_groups.columns else 'Contaminant'
protein_groups = protein_groups[
    (protein_groups.get(contam_col, '') != '+') &
    (protein_groups.get('Reverse', '') != '+') &
    (protein_groups.get('Only identified by site', '') != '+')
]

# Extract intensity columns (LFQ or iBAQ)
intensity_cols = [c for c in protein_groups.columns if c.startswith('LFQ intensity') or c.startswith('iBAQ ')]
if not intensity_cols:
    intensity_cols = [c for c in protein_groups.columns if c.startswith('Intensity ') and 'Intensity L' not in c]
intensities = protein_groups[['Protein IDs', 'Gene names'] + intensity_cols]

Loading Spectronaut/DIA-NN Output

diann_report = pd.read_csv('report.tsv', sep='\t')

# Pivot to protein-level matrix
protein_matrix = diann_report.pivot_table(
    index='Protein.Group', columns='Run', values='PG.MaxLFQ', aggfunc='first'
)

R: Loading with MSnbase

library(MSnbase)

raw_data <- readMSData('sample.mzML', mode = 'onDisk')
spectra <- spectra(raw_data)
header_info <- fData(raw_data)

Missing Value Assessment

def assess_missing_values(df, intensity_cols):
    missing_per_protein = df[intensity_cols].isna().sum(axis=1)
    missing_per_sample = df[intensity_cols].isna().sum(axis=0)

    total_missing = df[intensity_cols].isna().sum().sum()
    total_values = df[intensity_cols].size
    missing_pct = 100 * total_missing / total_values

    return {'per_protein': missing_per_protein, 'per_sample': missing_per_sample, 'total_pct': missing_pct}

Related Skills

quantification - Process imported data for quantification
peptide-identification - Identify peptides from raw spectra
expression-matrix/counts-ingest - Similar data loading patterns

GitHub リポジトリ

majiayu000/claude-skill-registry

パス: skills/data-import