SKILL·C4C207

evaluating-llms-harness

Name: evaluating-llms-harness
Author: davila7

davila7

更新日 2 months ago

176 閲覧

18,478

1,685

18,478

GitHubで表示

テストEvaluationLM Evaluation HarnessBenchmarkingMMLUHumanEvalGSM8KEleutherAIModel QualityAcademic BenchmarksIndustry Standard

について

このスキルは、業界標準のlm-evaluation-harnessを使用して、MMLUやGSM8Kなど60以上の学術ベンチマークで標準化されたLLM評価を実行します。HuggingFace、vLLM、APIベースのモデルをサポートし、モデル品質のベンチマーク測定、異なるモデルの比較、トレーニング進捗の追跡にご利用いただけます。学術研究成果を報告するための一貫性があり広く採用されている手法を提供します。

クイックインストール

Claude Code

推奨

メイン

npx skills add davila7/claude-code-templates -a claude-code

プラグインコマンド代替

/plugin add https://github.com/davila7/claude-code-templates

Git クローン代替

git clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/evaluating-llms-harness

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

GitHub リポジトリ

davila7/claude-code-templates

パス: cli-tool/components/skills/ai-research/evaluation-lm-evaluation-harness

anthropicanthropic-claudeclaudeclaude-code

FAQ

Frequently asked questions

What is the evaluating-llms-harness skill?

evaluating-llms-harness is a Claude Skill by davila7. Skills package instructions and resources that Claude loads on demand, so Claude can perform evaluating-llms-harness-related tasks without extra prompting.

How do I install evaluating-llms-harness?

Use the install commands on this page: add evaluating-llms-harness to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does evaluating-llms-harness belong to?

evaluating-llms-harness is in the Testing category, tagged Evaluation, LM Evaluation Harness, Benchmarking, MMLU, HumanEval and GSM8K.

Is evaluating-llms-harness free to use?

Yes. evaluating-llms-harness is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.