Back to Skills

evaluation-metrics

mattnigh
Updated 4 days ago
5 views
22
1
22
View on GitHub
Otheraitestingautomationdata

About

This Claude Skill automatically activates during LLM performance evaluation to ensure proper metrics and testing. It handles evaluation datasets, computes metrics, facilitates A/B testing, and implements LLM-as-judge patterns. Use it when you need structured experiment tracking and rigorous performance assessment for your LLM applications.

Quick Install

Claude Code

Recommended
Primary
npx skills add mattnigh/skills_collection -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/mattnigh/skills_collection
Git CloneAlternative
git clone https://github.com/mattnigh/skills_collection.git ~/.claude/skills/evaluation-metrics

Copy and paste this command in Claude Code to install this skill

GitHub Repository

mattnigh/skills_collection
Path: collection/ricardoroche__ricardos-claude-code__claude__skills__evaluation-metrics__SKILL.md
0

Related Skills