openrlhf-training

davila7

Aktualisiert 18 days ago

442 Ansichten

18,478

1,685

18,478

Auf GitHub ansehen

DesignPost-TrainingOpenRLHFRLHFPPOGRPORLOODPORayvLLMDistributed TrainingLarge ModelsZeRO-3

Über

OpenRLHF ist ein leistungsstarkes RLHF-Trainingsframework für das Feinabstimmen großer Sprachmodelle (7B-70B+ Parameter) mithilfe von Methoden wie PPO, DPO und GRPO. Es nutzt Ray für verteilte Architektur und vLLM für beschleunigte Inferenz und erreicht dabei Geschwindigkeiten, die doppelt so hoch sind wie bei Alternativen wie DeepSpeedChat. Verwenden Sie diese Fähigkeit, wenn Sie effizientes, verteiltes RLHF-Training mit optimierter GPU-Ressourcenfreigabe und ZeRO-3-Unterstützung benötigen.

Schnellinstallation

Claude Code

GitHub Repository

davila7/claude-code-templates

Pfad: cli-tool/components/skills/ai-research/post-training-openrlhf

anthropicanthropic-claudeclaudeclaude-code

Verwandte Skills

fine-tuning-with-trl

Andere

This skill enables fine-tuning of LLMs using TRL's reinforcement learning methods including SFT, DPO, and PPO for RLHF and preference alignment. It's designed for aligning models with human feedback and works with HuggingFace Transformers. Use it when you need to implement RLHF, optimize with rewards, or train from human preferences.

Skill ansehen

training-llms-megatron

Design

This skill trains massive LLMs (2B-462B parameters) using NVIDIA's Megatron-Core framework for maximum GPU efficiency. Use it when training models over 1B parameters and needing advanced parallelism like tensor, pipeline, or expert parallelism. It's a production-ready framework proven on models like Nemotron and LLaMA.

Skill ansehen

grpo-rl-training

Design

This skill provides expert guidance for implementing GRPO (Group Relative Policy Optimization) reinforcement learning fine-tuning using the TRL library. It's designed for training models on tasks requiring structured outputs, verifiable reasoning, or objective correctness metrics like coding or math. Key features include production-ready workflows for custom reward functions and enforcing specific output formats.

Skill ansehen

gptq

Andere

GPTQ is a 4-bit post-training quantization technique for LLMs that enables 4x memory reduction and 3-4x faster inference with minimal accuracy loss. It's ideal for deploying large models on consumer GPUs and integrates with transformers and PEFT for QLoRA fine-tuning. Use it when you need to fit 70B+ parameter models on limited hardware while maintaining performance.

Skill ansehen