bio-consensus-sequences
について
このスキルは、bcftools consensusを使用してVCFバリアントを参照ゲノムに適用し、コンセンサスFASTA配列を生成します。サンプル固有の参照配列の作成や、バリアントデータからのハプロタイプ再構築に使用されます。主な機能には、サンプルの指定、ファイルへの出力、解析のための個別ハプロタイプの選択が含まれます。
クイックインストール
Claude Code
推奨/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/bio-consensus-sequencesこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Consensus Sequences
Apply variants to reference FASTA using bcftools consensus.
Basic Usage
Generate Consensus
bcftools consensus -f reference.fa input.vcf.gz > consensus.fa
Specify Sample
bcftools consensus -f reference.fa -s sample1 input.vcf.gz > sample1.fa
Output to File
bcftools consensus -f reference.fa -o consensus.fa input.vcf.gz
Haplotype Selection
First Haplotype Only
bcftools consensus -f reference.fa -H 1 input.vcf.gz > haplotype1.fa
Second Haplotype Only
bcftools consensus -f reference.fa -H 2 input.vcf.gz > haplotype2.fa
Haplotype Options
| Option | Description |
|---|---|
-H 1 | First haplotype |
-H 2 | Second haplotype |
-H A | Apply all ALT alleles |
-H R | Apply REF alleles where heterozygous |
-I | Apply IUPAC ambiguity codes (separate flag) |
IUPAC Codes for Heterozygous Sites
bcftools consensus -f reference.fa -I input.vcf.gz > consensus_iupac.fa
Heterozygous sites encoded with IUPAC ambiguity codes:
- A/G → R
- C/T → Y
- A/C → M
- G/T → K
- A/T → W
- C/G → S
Missing Data Handling
Mark Missing as N
bcftools consensus -f reference.fa -M N input.vcf.gz > consensus.fa
Mark Low Coverage as N
Using a mask BED file:
# Create mask from depth
samtools depth input.bam | awk '$3<10 {print $1"\t"$2-1"\t"$2}' > low_coverage.bed
# Apply mask
bcftools consensus -f reference.fa -m low_coverage.bed input.vcf.gz > consensus.fa
Mask Options
| Option | Description |
|---|---|
-m FILE | Mask regions in BED file with N |
-M CHAR | Character for masked regions (default N) |
Region Selection
Specific Region
bcftools consensus -f reference.fa -r chr1:1000-2000 input.vcf.gz > region.fa
Multiple Regions
Use with BED file to extract multiple regions.
Chain Files
Generate Chain File
bcftools consensus -f reference.fa -c chain.txt input.vcf.gz > consensus.fa
Chain files map coordinates between reference and consensus:
- Useful for liftover of annotations
- Required when indels change sequence length
Chain File Format
chain score ref_name ref_size ref_strand ref_start ref_end query_name query_size query_strand query_start query_end id
Sample-Specific Consensus
For Each Sample
for sample in $(bcftools query -l input.vcf.gz); do
bcftools consensus -f reference.fa -s "$sample" input.vcf.gz > "${sample}.fa"
done
Both Haplotypes
sample="sample1"
bcftools consensus -f reference.fa -s "$sample" -H 1 input.vcf.gz > "${sample}_hap1.fa"
bcftools consensus -f reference.fa -s "$sample" -H 2 input.vcf.gz > "${sample}_hap2.fa"
Filtering Before Consensus
PASS Variants Only
bcftools view -f PASS input.vcf.gz | \
bcftools consensus -f reference.fa > consensus.fa
High-Quality Variants Only
bcftools filter -i 'QUAL>=30 && INFO/DP>=10' input.vcf.gz | \
bcftools consensus -f reference.fa > consensus.fa
SNPs Only
bcftools view -v snps input.vcf.gz | \
bcftools consensus -f reference.fa > consensus_snps.fa
Sequence Naming
Default Naming
Output uses reference sequence names.
Custom Prefix
bcftools consensus -f reference.fa -p "sample1_" input.vcf.gz > consensus.fa
Sequences named: sample1_chr1, sample1_chr2, etc.
Common Workflows
Phylogenetic Analysis Preparation
# For each sample, generate consensus
mkdir -p consensus
for sample in $(bcftools query -l cohort.vcf.gz); do
bcftools view -s "$sample" cohort.vcf.gz | \
bcftools view -c 1 | \
bcftools consensus -f reference.fa > "consensus/${sample}.fa"
done
# Combine for alignment
cat consensus/*.fa > all_samples.fa
Viral Genome Assembly
# Apply high-quality variants only
bcftools filter -i 'QUAL>=30 && INFO/DP>=20' variants.vcf.gz | \
bcftools view -f PASS | \
bcftools consensus -f reference.fa -M N > consensus.fa
Gene-Specific Consensus
# Extract gene region
bcftools consensus -f reference.fa -r chr1:1000000-1010000 \
-s sample1 variants.vcf.gz > gene.fa
Masked Low-Coverage Regions
# Create mask from coverage
samtools depth -a input.bam | \
awk '$3<5 {print $1"\t"$2-1"\t"$2}' | \
bedtools merge > low_coverage.bed
# Generate consensus with mask
bcftools consensus -f reference.fa -m low_coverage.bed \
variants.vcf.gz > consensus.fa
Verify Consensus
Check Differences
# Align consensus to reference
minimap2 -a reference.fa consensus.fa | samtools view -bS > alignment.bam
# Or simple comparison
diff <(grep -v "^>" reference.fa) <(grep -v "^>" consensus.fa) | head
Count Changes
# Number of differences
bcftools view -H input.vcf.gz | wc -l
Handling Overlapping Variants
bcftools consensus handles overlapping variants automatically:
- Applies variants in order
- Warns about conflicts
Check for warnings:
bcftools consensus -f reference.fa input.vcf.gz 2>&1 | grep -i warn
cyvcf2 Consensus (Simple Cases)
Manual Consensus Generation
from cyvcf2 import VCF
from Bio import SeqIO
# Load reference
ref_dict = {rec.id: str(rec.seq) for rec in SeqIO.parse('reference.fa', 'fasta')}
# Apply variants (SNPs only, simplified)
vcf = VCF('input.vcf.gz')
changes = {}
for variant in vcf:
if variant.is_snp and len(variant.ALT) == 1:
chrom = variant.CHROM
pos = variant.POS - 1 # 0-based
if chrom not in changes:
changes[chrom] = {}
changes[chrom][pos] = variant.ALT[0]
# Apply changes
for chrom, positions in changes.items():
seq = list(ref_dict[chrom])
for pos, alt in positions.items():
seq[pos] = alt
ref_dict[chrom] = ''.join(seq)
# Write output
with open('consensus.fa', 'w') as f:
for chrom, seq in ref_dict.items():
f.write(f'>{chrom}\n{seq}\n')
Note: Use bcftools consensus for production - handles indels and edge cases properly.
Quick Reference
| Task | Command |
|---|---|
| Basic consensus | bcftools consensus -f ref.fa in.vcf.gz |
| Specific sample | bcftools consensus -f ref.fa -s sample in.vcf.gz |
| Haplotype 1 | bcftools consensus -f ref.fa -H 1 in.vcf.gz |
| IUPAC codes | bcftools consensus -f ref.fa -I in.vcf.gz |
| With mask | bcftools consensus -f ref.fa -m mask.bed in.vcf.gz |
| Generate chain | bcftools consensus -f ref.fa -c chain.txt in.vcf.gz |
| Specific region | bcftools consensus -f ref.fa -r chr1:1-1000 in.vcf.gz |
Common Errors
| Error | Cause | Solution |
|---|---|---|
not indexed | VCF not indexed | Run bcftools index |
sequence not found | Chromosome mismatch | Check chromosome names |
overlapping records | Variants overlap | Usually OK, check warnings |
REF does not match | Wrong reference | Use same reference as caller |
Related Skills
- variant-calling - Generate VCF for consensus
- vcf-filtering - Filter variants before consensus
- variant-normalization - Normalize indels first
- alignment-files/reference-operations - Reference manipulation
GitHub リポジトリ
関連スキル
algorithmic-art
メタThis Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.
subagent-driven-development
開発This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.
executing-plans
デザインUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
cost-optimization
その他This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.
