SKILL·1536D0

evaluation-metrics

Name: evaluation-metrics
Author: mattnigh

mattnigh

更新于 1 month ago

11 次查看

其他aitestingautomationdata

关于

This Claude Skill automatically activates during LLM performance evaluation to ensure proper metrics and testing. It handles evaluation datasets, computes metrics, facilitates A/B testing, and implements LLM-as-judge patterns. Use it when you need structured experiment tracking and rigorous performance assessment for your LLM applications.

快速安装

Claude Code

GitHub 仓库

mattnigh/skills_collection

路径: collection/ricardoroche__ricardos-claude-code__claude__skills__evaluation-metrics__SKILL.md

FAQ

Frequently asked questions

What is the evaluation-metrics skill?

evaluation-metrics is a Claude Skill by mattnigh. Skills package instructions and resources that Claude loads on demand, so Claude can perform evaluation-metrics-related tasks without extra prompting.

How do I install evaluation-metrics?

Use the install commands on this page: add evaluation-metrics to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does evaluation-metrics belong to?

evaluation-metrics is in the ai-llm category, tagged ai, testing, automation and data.

Is evaluation-metrics free to use?

Yes. evaluation-metrics is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.