training-llms-megatron

davila7

Aktualisiert 15 days ago

294 Ansichten

18,478

1,685

18,478

DesignMegatron-CoreLarge-Scale TrainingNVIDIATensor ParallelismPipeline ParallelismModel ParallelismH100Distributed TrainingProduction

Über

Diese Fähigkeit trainiert massive LLMs (2B-462B Parameter) mit NVIDIAs Megatron-Core-Framework für maximale GPU-Effizienz. Nutzen Sie sie, wenn Sie Modelle mit über 1B Parametern trainieren und erweiterte Parallelisierungsmethoden wie Tensor-, Pipeline- oder Expert-Parallelismus benötigen. Es handelt sich um ein produktionsreifes Framework, das sich bereits bei Modellen wie Nemotron und LLaMA bewährt hat.

Schnellinstallation

Claude Code

GitHub Repository

davila7/claude-code-templates

Pfad: cli-tool/components/skills/ai-research/distributed-training-megatron-core

anthropicanthropic-claudeclaudeclaude-code

Verwandte Skills

openrlhf-training

Design

OpenRLHF is a high-performance RLHF training framework for fine-tuning large language models (7B-70B+ parameters) using methods like PPO, DPO, and GRPO. It leverages Ray for distributed architecture and vLLM for accelerated inference, achieving speeds 2x faster than alternatives like DeepSpeedChat. Use this skill when you need efficient, distributed RLHF training with optimized GPU resource sharing and ZeRO-3 support.

Skill ansehen

huggingface-tokenizers

Dokumente

This skill provides high-performance tokenization using HuggingFace's Rust-based library, processing 1GB of text in under 20 seconds. It supports BPE, WordPiece, and Unigram algorithms while enabling custom tokenizer training and alignment tracking. Use it when you need production-fast tokenization or to build custom tokenizers integrated with the transformers ecosystem.