SKILL·2596F0

eval-harness

Name: eval-harness
Author: affaan-m

affaan-m

Updated 1 month ago

10 views

77,317

9,709

77,317

View on GitHub

Developmentai

About

The eval-harness skill provides a formal evaluation framework for implementing eval-driven development (EDD) in Claude Code sessions. It enables developers to define pass/fail criteria, measure agent reliability with pass@k metrics, and create regression test suites. Use it when setting up EDD workflows, benchmarking agent performance, or tracking regressions across prompt or model changes.

Quick Install

Claude Code

Recommended

Primary

npx skills add affaan-m/everything-claude-code -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/affaan-m/everything-claude-code

Git CloneAlternative

git clone https://github.com/affaan-m/everything-claude-code.git ~/.claude/skills/eval-harness

Copy and paste this command in Claude Code to install this skill

GitHub Repository

affaan-m/everything-claude-code

Path: .agents/skills/eval-harness

ai-agentsanthropicclaudeclaude-codedeveloper-toolsllm

FAQ

Frequently asked questions

What is the eval-harness skill?

eval-harness is a Claude Skill by affaan-m. Skills package instructions and resources that Claude loads on demand, so Claude can perform eval-harness-related tasks without extra prompting.

How do I install eval-harness?

Use the install commands on this page: add eval-harness to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does eval-harness belong to?

eval-harness is in the Development category, tagged ai.

Is eval-harness free to use?

Yes. eval-harness is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

qmd

Development

qmd is a local search and indexing CLI tool that enables developers to index and search through local files using hybrid search combining BM25, vector embeddings, and reranking. It supports both command-line usage and MCP (Model Context Protocol) mode for integration with Claude. The tool uses Ollama for embeddings and stores indexes locally, making it ideal for searching documentation or codebases directly from the terminal.

View skill

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

mcporter

Development

The mcporter skill enables developers to manage and call Model Context Protocol (MCP) servers directly from Claude. It provides commands to list available servers, call their tools with arguments, and handle authentication and daemon lifecycle. Use this skill for integrating and testing MCP server functionality in your development workflow.

View skill

adk-deployment-specialist

Development

This skill deploys and orchestrates Vertex AI ADK agents using A2A protocol, managing AgentCard discovery, task submission, and supporting tools like Code Execution Sandbox and Memory Bank. It enables building multi-agent systems with sequential, parallel, or loop orchestration patterns in Python, Java, or Go. Use it when asked to deploy ADK agents or orchestrate agent workflows on Google Cloud.

View skill