openrlhf-training
Über
OpenRLHF ist ein leistungsstarkes RLHF-Trainingsframework für das Feinabstimmen großer Sprachmodelle (7B-70B+ Parameter) mithilfe von Methoden wie PPO, DPO und GRPO. Es nutzt Ray für verteilte Architektur und vLLM für beschleunigte Inferenz und erreicht dabei Geschwindigkeiten, die doppelt so hoch sind wie bei Alternativen wie DeepSpeedChat. Verwenden Sie diese Fähigkeit, wenn Sie effizientes, verteiltes RLHF-Training mit optimierter GPU-Ressourcenfreigabe und ZeRO-3-Unterstützung benötigen.
Schnellinstallation
Claude Code
Empfohlennpx skills add davila7/claude-code-templates -a claude-code/plugin add https://github.com/davila7/claude-code-templatesgit clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/openrlhf-trainingKopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren
GitHub Repository
Verwandte Skills
fine-tuning-with-trl
AndereThis skill enables fine-tuning of LLMs using TRL's reinforcement learning methods including SFT, DPO, and PPO for RLHF and preference alignment. It's designed for aligning models with human feedback and works with HuggingFace Transformers. Use it when you need to implement RLHF, optimize with rewards, or train from human preferences.
training-llms-megatron
DesignThis skill trains massive LLMs (2B-462B parameters) using NVIDIA's Megatron-Core framework for maximum GPU efficiency. Use it when training models over 1B parameters and needing advanced parallelism like tensor, pipeline, or expert parallelism. It's a production-ready framework proven on models like Nemotron and LLaMA.
grpo-rl-training
DesignThis skill provides expert guidance for implementing GRPO (Group Relative Policy Optimization) reinforcement learning fine-tuning using the TRL library. It's designed for training models on tasks requiring structured outputs, verifiable reasoning, or objective correctness metrics like coding or math. Key features include production-ready workflows for custom reward functions and enforcing specific output formats.
gptq
AndereGPTQ is a 4-bit post-training quantization technique for LLMs that enables 4x memory reduction and 3-4x faster inference with minimal accuracy loss. It's ideal for deploying large models on consumer GPUs and integrates with transformers and PEFT for QLoRA fine-tuning. Use it when you need to fit 70B+ parameter models on limited hardware while maintaining performance.
