À propos
Cette compétence permet d'effectuer des requêtes programmatiques sur l'ensemble de données du projet 1000 Genomes pour trouver des individus ou des variants selon des critères génétiques spécifiques. Elle peut identifier les porteurs, les homozygotes, les liens de parenté et les variants, en les retournant avec des annotations clés comme les fréquences alléliques et les scores AlphaMissense. Utilisez-la lorsque votre question implique des données génétiques au niveau individuel provenant de cette cohorte spécifique.
Installation rapide
Claude Code
Recommandénpx skills add K-Dense-AI/claude-scientific-skills -a claude-code/plugin add https://github.com/K-Dense-AI/claude-scientific-skillsgit clone https://github.com/K-Dense-AI/claude-scientific-skills.git ~/.claude/skills/onekgpdCopiez et collez cette commande dans Claude Code pour installer cette compétence
Documentation
OneKGPd: Individual-Level Queries over the 1000 Genomes Project
Scope
This skill queries the 1000 Genomes Project dataset — the extended high-coverage cohort
of 3,202 whole-genome-sequenced individuals, on the GRCh38 assembly. All results
are drawn from this cohort, and sample names returned by the skill (for example
HG00096 or NA21130) identify its participants.
Queries resolve against the cohort's per-individual genotype data. This supports two complementary classes of question: selecting variants carried within a region (across the whole cohort or within a specified set of individuals), and selecting the individuals who carry variants matching given criteria. Variant selection can be filtered by allele frequency, predicted consequence, clinical significance, AlphaMissense classification, and the other annotation axes listed below. Relatedness between two named individuals is also available.
The genotype state in which a variant is carried — heterozygous or homozygous — is a criterion that queries may specify; results are returned as variants or as sample names, not as raw genotypes.
When to Use
Use this skill when you need to:
- Find variants carried in a region or set of regions matching some criteria
across the whole cohort (
select-variants). - Find variants carried in a region or set of regions matching some criteria
in specific set of individuals (
select-variants-in-samples). - Find which 1000 Genomes individuals carry variants matching some criteria
in a region or set of regions (
select-samples). - Count how many individuals carry specific variants (
count-samples). - Restrict any variant query to heterozygous-only or homozygous-only carriage, or query both together (default).
- Identify which individuals are homozygous reference at a single position
(
select-samples-hom-ref). - Determine the relatedness between two named 1000 Genomes individuals —
both the degree (twin / 1st / 2nd / 3rd / unrelated) and the KING kinship
coefficient (
kinship). - Get dataset totals — sample count, sex split, variant count, assembly
(
dataset-info). - Variant selection can be specified by KGP allele frequency, gnomAD 4.1 exome and gnomAD 4.1 genome allele frequency, AlphaMissense Score and AlphaMissense Class, ClinVar significance (202502), and VEP annotations (impact, biotype, feature type, variant class, consequences).
Do NOT use this skill for:
- Resolving a gene symbol, rsID, or transcript to coordinates, or fetching reference sequence. Resolve coordinates first (see Coordinate Provenance below), then query this skill with the resolved GRCh38 region.
- Any cohort other than the 1000 Genomes Project — this skill serves only that dataset.
Prerequisites
uv: This skill's script is run withuv run, which reads the script's inline dependency metadata and provisions an ephemeral environment. Ensureuvis installed and on PATH (https://docs.astral.sh/uv/).- Data use terms: The 1000 Genomes Project data is open; users should be aware of the 1000 Genomes Project / IGSR data-use terms (https://www.internationalgenome.org/data).
- Access constraints: There is no API key, no
.envfile, and no rate-limit token to configure. - No credentials required
Core Rules
- Use the Wrappers: ALWAYS execute the provided helper scripts rather than
constructing your own client calls or network requests. Use
scripts/onekgpd_api.pyfor variant/sample/kinship queries (it handles the connection, streaming, pagination, and JSON serialization), andscripts/onekgpd_meta.pyfor sample/population metadata (offline, see Sample & population metadata). - Coordinates MUST be resolved against an authoritative source first — see Coordinate Provenance. This is mandatory, not advisory.
- Count before you select: every variant and sample selection has a paired counting command. Call the count command FIRST to size the result set, then select only if the count is manageable.
- Zygosity defaults to both: selection and counting commands include both
heterozygous and homozygous carriage by default. Narrow with
--het-onlyor--hom-onlywhen the question is specifically about one state. (You do not need to pass anything to get both.) - Output: scripts write full JSON to a file (
--output, default under/tmp/) and print a concise summary to stdout. Do not read large JSON files into context — usejqor a small disposableuv run pythonsnippet to extract fields.
Coordinate Provenance (MANDATORY FIRST STEP)
Before any region-based query, resolve the gene or feature to GRCh38 coordinates against an authoritative source (for example Ensembl), and query with those resolved coordinates. The assembly must be explicit, and a gene-range must be resolved to precise positions before use. This is structural, not advisory: there is no source-side guardrail that would catch a misplaced region, so an unverified coordinate produces results for an unintended location with no error.
# Resolve gene symbol -> GRCh38 region with an authoritative source FIRST,
# then pass the verified coordinates to the OneKGPd query below.
[!CAUTION] The dataset is GRCh38. A GRCh37 coordinate, or any region that does not correctly correspond to the intended feature on GRCh38, will return results for an unintended location without raising an error. Verify the assembly and the resolved coordinates before querying.
Command Selection Guide
Match the question to the command. Counting commands are cheap and should precede their selection counterpart.
- Which individuals carry matching variants in a region →
count-samplesthenselect-samples - Which variants are carried in a region, cohort-wide →
count-variantsthenselect-variants - Which variants are carried in a region, within a named set of individuals →
count-variants-in-samplesthenselect-variants-in-samples - Who is homozygous-reference at a single position →
count-samples-hom-refthenselect-samples-hom-ref - Relatedness (degree + coefficient) between two named individuals →
kinship - Dataset totals (sample count, sex split, variant total, assembly) →
dataset-info
Annotation filters (shared across variant and sample selection/counting)
All variant- and sample-selection commands (count-variants,
select-variants, their -in-samples forms, count-samples, select-samples)
accept the same annotation filters. Different filter fields are combined with
AND; multiple values within one field are combined with OR. Enum values
are case-insensitive (e.g. missense_variant or MISSENSE_VARIANT).
These are selection criteria applied on the server. The fields returned on a selected variant are listed under Variant-returning commands; a criterion used for filtering is not necessarily echoed back on the returned variant.
--af-lt/--af-gt: 1000 Genomes dataset allele frequency bounds--gnomad-exomes-af-lt/--gnomad-exomes-af-gt: gnomAD v4.1 exome AF bounds--gnomad-genomes-af-lt/--gnomad-genomes-af-gt: gnomAD v4.1 genome AF bounds--clin-significance: ClinVar significance terms, CSV (e.g.PATHOGENIC,LIKELY_PATHOGENIC)--consequence: Sequence Ontology consequence terms, CSV (e.g.MISSENSE_VARIANT,STOP_GAINED)--impact: VEP impact, CSV (HIGH,MODERATE,LOW,MODIFIER)--variant-type,--feature-type,--bio-type: SO variant class / VEP feature / VEP biotype, CSV--alpha-missense-class:AM_LIKELY_BENIGN,AM_LIKELY_PATHOGENIC,AM_AMBIGUOUS(CSV)--alpha-missense-score-lt/--alpha-missense-score-gt: AlphaMissense score bounds--biallelic-only/--multiallelic-only--exclude-males/--exclude-females--min-len-bp/--max-len-bp: alternate-allele length bounds (bp)
[!NOTE]
--alpha-missense-classand--alpha-missense-score-*are mutually exclusive (the engine ignores the class when a score bound is set).--biallelic-onlyand--multiallelic-onlyare mutually exclusive.--exclude-malesand--exclude-femalesare mutually exclusive. Setting a*-gtbound greater than or equal to its matching*-ltbound defines an empty range and will return nothing.
[!NOTE] Allele-frequency fields use
0.0to mean "not present in that source." So--gnomad-exomes-af-gt 0selects variants that are in gnomAD exomes; a returnedgnomad_exomes_afof0.0means the variant is absent from gnomAD exomes. The same convention for gnomAD genomes AF.
[!NOTE]
am_scoreof0.0means not scored or not annotated by AlphaMissense - it does not meanbenign. A real AlphaMissense score is always greater than 0.
Quick Start
# Step 1. Resolve coordinates against an authoritative source — see Coordinate Provenance.
# example: BRCA1: chr17:43044292-43170245
# Step 2. Size the result set: how many individuals carry predicted likely-pathogenic
# missense variants in this region?
uv run scripts/onekgpd_api.py count-samples \
--chrom chr17 --start 43044292 --end 43170245 \
--consequence MISSENSE_VARIANT \
--alpha-missense-class AM_LIKELY_PATHOGENIC \
--output /tmp/count.json
# Step 3. If the count is manageable, list those individuals.
uv run scripts/onekgpd_api.py select-samples \
--chrom chr17 --start 43044292 --end 43170245 \
--consequence MISSENSE_VARIANT \
--alpha-missense-class AM_LIKELY_PATHOGENIC \
--output /tmp/samples.json
# Step 4: For that set of individuals, see the actual variants they carry.
uv run scripts/onekgpd_api.py select-variants-in-samples \
--chrom chr17 --start 43044292 --end 43170245 \
--samples HG03169,NA20506 \
--consequence MISSENSE_VARIANT --alpha-missense-class AM_LIKELY_PATHOGENIC \
--output /tmp/variants.json
Commands
Each command writes full JSON to a file (--output PATH, default a temp file)
and prints a concise stdout summary. All region/sample commands share: the
region input (--chrom/--start/--end with optional --ref/--alt, or one
or more repeated --region CHR:START-END), the zygosity flags
(--het-only/--hom-only, default both), and the annotation filters above.
The full per-flag tables live in
references/onekgpd_commands.md.
Variant-returning commands
select-* return matching variants; count-* return an integer count.
count-variants— count variants in a region, cohort-wide.select-variants— select variants in a region, cohort-wide. Use--limit N(hard cap, default 1000) or--page-size N(retrieve the full set in pages); the two are mutually exclusive. The summary flagstruncatedwhen the cap is reached.count-variants-in-samples— ascount-variants, restricted to--samples NAME1,NAME2,...(required).select-variants-in-samples— asselect-variants, restricted to--samples NAME1,NAME2,...(required).
Each returned variant carries these 19 keys: chr, start, end, ref,
alt, af, ac, an, homc, hetc, misc, homfc, hetfc, misfc,
gnomad_exomes_af, gnomad_genomes_af, am_score, amino_acids, biallelic.
ClinVar significance and VEP consequence are filter criteria only and are not
returned. Full schema:
references/onekgpd_commands.md.
Sample-returning commands
count-samples— count individuals carrying a matching variant in a region.select-samples— list the names of individuals carrying a matching variant. Supports--skip Nand--limit N. Returns names only; to see which variants qualified an individual, feed the names intoselect-variants-in-samples.
Homozygous-reference commands
Single position via --chrom + --position (not a region).
count-samples-hom-ref— count individuals with a 0/0 call at the position. The count is a sentinel:-1= no variant exists at that position at all;0= a variant exists but no individual is homozygous reference;>0= the number of homozygous-reference individuals. The summary states which case.select-samples-hom-ref— list the individuals with a 0/0 call at the position.
Relatedness command
kinship --sample1 NAME --sample2 NAME— relatedness between two named individuals: the degree (TWINS_MONOZYGOTIC/FIRST_DEGREE/SECOND_DEGREE/THIRD_DEGREE/UNRELATED) and the KING kinship coefficient (phi_bwf).
Dataset metadata command
dataset-info— dataset totals:samples_total(3,202), female/male split,variants_total,assembly(GRCh38), and the cohort breakdown. No region required; doubles as a connectivity check.
Sample & population metadata (offline)
Population, sex, pedigree, and superpopulation questions are answered by a second
script, scripts/onekgpd_meta.py, from a data file bundled in the skill — no
network, no credentials, no coordinates. The sample IDs are the same names the
variant commands use, so the two layers compose (e.g. pick a cohort by population,
then query its variants). Run uv run scripts/onekgpd_meta.py <command>.
The cohort has 5 superpopulations (AFR, AMR, EAS, EUR, SAS) and 26
populations. Population/superpopulation values match case-insensitively by
short code or full name; sample IDs are case-sensitive.
sample-metadata --samples NA19240,HG00096— family, gender, parents, children, population, superpopulation, and phase3 status for the given samples.list-populations— all 26 populations with superpopulation and sample count (use to discover valid values).list-superpopulations— the 5 superpopulations with sample count and constituent populations.population-stats --populations YRI [--populations CHS …]— per-population sex split, phase3 count, and trio membership. Repeat--populationsfor multiple values (full names contain commas, so they are not comma-separated).superpopulation-summary --superpopulations EAS [--superpopulations EUR …]— per-superpopulation totals with a per-population breakdown.select-samples-by-population --population YRIand/or--superpopulation AFR, with optional--skip/--limit(default 0 / 50, max 3202) — the sample IDs in a population and/or superpopulation; both given intersects. Feed the names intoselect-variants-in-samplesto see their variants.
See references/onekgpd_commands.md for full argument tables and JSON output schemas.
Typical Workflows
Which individuals, then which variants they carry
# Step 1: resolve gene -> verified GRCh38 region (authoritative source).
# Step 2: count individuals carrying a qualifying variant in the region.
uv run scripts/onekgpd_api.py count-samples \
--chrom <chr> --start <start> --end <end> \
--consequence MISSENSE_VARIANT --alpha-missense-class AM_LIKELY_PATHOGENIC \
--output /tmp/n.json
# Step 3: list those individuals.
uv run scripts/onekgpd_api.py select-samples \
--chrom <chr> --start <start> --end <end> \
--consequence MISSENSE_VARIANT --alpha-missense-class AM_LIKELY_PATHOGENIC \
--output /tmp/who.json
# Step 4: for that set of individuals, see the actual variants they carry.
uv run scripts/onekgpd_api.py select-variants-in-samples \
--chrom <chr> --start <start> --end <end> \
--samples <name1,name2,...> \
--consequence MISSENSE_VARIANT --alpha-missense-class AM_LIKELY_PATHOGENIC \
--output /tmp/variants.json
Homozygous-reference carriers at a position of interest
# After identifying a position of interest (verified coordinate):
uv run scripts/onekgpd_api.py count-samples-hom-ref \
--chrom <chr> --position <pos> --output /tmp/homref_n.json
uv run scripts/onekgpd_api.py select-samples-hom-ref \
--chrom <chr> --position <pos> --output /tmp/homref.json
Common Mistakes
- Mistake: Querying with an unverified coordinate. Fix: Always resolve gene/feature → GRCh38 against an authoritative source first. A misplaced region returns results for an unintended location without error.
- Mistake: Calling a selection command before its counting command. Fix: Count first; selection result sets can be large.
- Mistake: Assuming a GRCh37 coordinate will work. Fix: The dataset is GRCh38 only.
References
- references/onekgpd_commands.md — full per-command argument tables and the returned-variant output schema.
- references/annotation_vocabularies.md — the controlled-vocabulary terms accepted by the CSV filter flags (consequence, impact, biotype, feature type, ClinVar significance, AlphaMissense class, variant class).
- 1000 Genomes Project / IGSR: https://www.internationalgenome.org/
- 1000 Genomes Project dataset online: https://dnaerys.org/online/
Dépôt GitHub
Frequently asked questions
What is the onekgpd skill?
onekgpd is a Claude Skill by K-Dense-AI. Skills package instructions and resources that Claude loads on demand, so Claude can perform onekgpd-related tasks without extra prompting.
How do I install onekgpd?
Use the install commands on this page: add onekgpd to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.
What category does onekgpd belong to?
onekgpd is in the Design category, tagged data.
Is onekgpd free to use?
Yes. onekgpd is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.
Compétences associées
Utilisez la compétence executing-plans lorsque vous disposez d'un plan de mise en œuvre complet à exécuter par lots contrôlés avec des points de contrôle de revue. Elle charge et examine le plan de manière critique, puis exécute les tâches par petits lots (3 tâches par défaut) tout en rapportant la progression entre chaque lot pour une revue par l'architecte. Cela garantit une mise en œuvre systématique avec des points de contrôle de qualité intégrés.
Cette compétence délègue un sous-agent réviseur de code pour analyser les modifications apportées au code par rapport aux exigences avant de poursuivre. Elle doit être utilisée après avoir terminé des tâches, implémenté des fonctionnalités majeures, ou avant une fusion vers la branche principale. La revue aide à détecter précocement les problèmes en comparant l'implémentation actuelle avec le plan initial.
Cette compétence fournit un guide complet permettant aux développeurs de connecter des serveurs MCP à Claude Code via les transports HTTP, stdio ou SSE. Elle couvre l'installation, la configuration, l'authentification et la sécurité pour intégrer des services externes tels que GitHub, Notion et des API personnalisées. Utilisez-la lors de la configuration d'intégrations MCP, de la configuration d'outils externes ou du travail avec le Protocole de Contexte de Modèle de Claude.
Cette compétence aide les développeurs à choisir entre les interfaces Web et CLI de Claude Code en fonction de l'analyse des tâches, puis permet une téléportation transparente des sessions entre ces environnements. Elle optimise le flux de travail en gérant l'état et le contexte de la session lors du passage entre le web, la CLI ou le mobile. Utilisez-la pour des projets complexes nécessitant différents outils à diverses étapes.
