fine-tuning-with-trl

davila7

Aktualisiert 16 days ago

375 Ansichten

18,478

1,685

18,478

Auf GitHub ansehen

AnderePost-TrainingTRLReinforcement LearningFine-TuningSFTDPOPPOGRPORLHFPreference AlignmentHuggingFace

Über

Diese Fähigkeit ermöglicht das Feinabstimmen von LLMs mit den Reinforcement-Learning-Methoden von TRL, einschließlich SFT, DPO und PPO für RLHF und Präferenzabgleich. Sie ist für die Ausrichtung von Modellen an menschlichem Feedback konzipiert und funktioniert mit HuggingFace Transformers. Nutzen Sie sie, wenn Sie RLHF implementieren, mit Belohnungen optimieren oder auf Basis menschlicher Präferenzen trainieren müssen.

Schnellinstallation

Claude Code

GitHub Repository

davila7/claude-code-templates

Pfad: cli-tool/components/skills/ai-research/post-training-trl-fine-tuning

anthropicanthropic-claudeclaudeclaude-code

Verwandte Skills

quantizing-models-bitsandbytes

Andere

This skill quantizes LLMs to 8-bit or 4-bit precision using bitsandbytes, achieving 50-75% memory reduction with minimal accuracy loss. It's ideal for running larger models on limited GPU memory or accelerating inference, supporting formats like INT8, NF4, and FP4. The skill integrates with HuggingFace Transformers and enables QLoRA training and 8-bit optimizers.

Skill ansehen

openrlhf-training

Design

OpenRLHF is a high-performance RLHF training framework for fine-tuning large language models (7B-70B+ parameters) using methods like PPO, DPO, and GRPO. It leverages Ray for distributed architecture and vLLM for accelerated inference, achieving speeds 2x faster than alternatives like DeepSpeedChat. Use this skill when you need efficient, distributed RLHF training with optimized GPU resource sharing and ZeRO-3 support.

Skill ansehen

weights-and-biases

Design

This skill integrates Weights & Biases for comprehensive ML experiment tracking and MLOps. It automatically logs metrics, visualizes training in real-time, and manages hyperparameter sweeps and model versions. Use it to compare runs, optimize models, and collaborate within team workspaces directly from your development environment.

Skill ansehen

huggingface-tokenizers

Dokumente

This skill provides high-performance tokenization using HuggingFace's Rust-based library, processing 1GB of text in under 20 seconds. It supports BPE, WordPiece, and Unigram algorithms while enabling custom tokenizer training and alignment tracking. Use it when you need production-fast tokenization or to build custom tokenizers integrated with the transformers ecosystem.

Skill ansehen