SKILL·1536D0

evaluation-metrics

Name: evaluation-metrics
Author: mattnigh

mattnigh

Updated 1 month ago

11 views

Otheraitestingautomationdata

About

This Claude Skill automatically activates during LLM performance evaluation to ensure proper metrics and testing. It handles evaluation datasets, computes metrics, facilitates A/B testing, and implements LLM-as-judge patterns. Use it when you need structured experiment tracking and rigorous performance assessment for your LLM applications.

Quick Install

Claude Code

Recommended

Primary

npx skills add mattnigh/skills_collection -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/mattnigh/skills_collection

Git CloneAlternative

git clone https://github.com/mattnigh/skills_collection.git ~/.claude/skills/evaluation-metrics

Copy and paste this command in Claude Code to install this skill

GitHub Repository

mattnigh/skills_collection

Path: collection/ricardoroche__ricardos-claude-code__claude__skills__evaluation-metrics__SKILL.md

FAQ

Frequently asked questions

What is the evaluation-metrics skill?

evaluation-metrics is a Claude Skill by mattnigh. Skills package instructions and resources that Claude loads on demand, so Claude can perform evaluation-metrics-related tasks without extra prompting.

How do I install evaluation-metrics?

Use the install commands on this page: add evaluation-metrics to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does evaluation-metrics belong to?

evaluation-metrics is in the ai-llm category, tagged ai, testing, automation and data.

Is evaluation-metrics free to use?

Yes. evaluation-metrics is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

model-selection

Other

This Claude Skill automatically guides model and provider selection for LLM applications. It provides patterns for cost optimization, fallback strategies, and multi-model routing across providers like OpenAI and Anthropic. Use it when implementing model comparison, provider failover, or performance/cost trade-offs in your LLM system.

View skill

agent-orchestration-patterns

Other

This Claude Skill automatically guides multi-agent system design by enforcing proper tool schema creation with Pydantic, managing agent states, and implementing robust error handling. It provides orchestration patterns for reliable tool-calling workflows and agent routing. Use it when building complex agent systems to ensure maintainable and structured interactions.

View skill

ai-security

Other

The ai-security skill automatically applies security protections for AI/LLM applications. It provides prompt injection detection, PII redaction, output filtering, and content moderation. Use this skill when building LLM applications that need built-in security guardrails.

View skill

model-selection

Other

This skill automatically guides model and provider selection for LLM applications. It provides patterns for cost optimization, fallback strategies, and multi-model routing across providers like OpenAI and Anthropic. Use it when implementing model comparison, provider failover, or cost-performance optimization in your LLM system.

View skill