Back to Skills

eval-harness

affaan-m
Updated 3 days ago
2 views
77,317
9,709
77,317
View on GitHub
Developmentai

About

The eval-harness skill provides a formal evaluation framework for implementing eval-driven development (EDD) in Claude Code sessions. It enables developers to define pass/fail criteria, measure agent reliability with pass@k metrics, and create regression test suites. Use it when setting up EDD workflows, benchmarking agent performance, or tracking regressions across prompt or model changes.

Quick Install

Claude Code

Recommended
Primary
npx skills add affaan-m/everything-claude-code -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/affaan-m/everything-claude-code
Git CloneAlternative
git clone https://github.com/affaan-m/everything-claude-code.git ~/.claude/skills/eval-harness

Copy and paste this command in Claude Code to install this skill

GitHub Repository

affaan-m/everything-claude-code
Path: .agents/skills/eval-harness
0
ai-agentsanthropicclaudeclaude-codedeveloper-toolsllm

Related Skills

qmd

Development

qmd is a local search and indexing CLI tool that enables developers to index and search through local files using hybrid search combining BM25, vector embeddings, and reranking. It supports both command-line usage and MCP (Model Context Protocol) mode for integration with Claude. The tool uses Ollama for embeddings and stores indexes locally, making it ideal for searching documentation or codebases directly from the terminal.

View skill

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

mcporter

Development

The mcporter skill enables developers to manage and call Model Context Protocol (MCP) servers directly from Claude. It provides commands to list available servers, call their tools with arguments, and handle authentication and daemon lifecycle. Use this skill for integrating and testing MCP server functionality in your development workflow.

View skill

adk-deployment-specialist

Development

This skill deploys and orchestrates Vertex AI ADK agents using A2A protocol, managing AgentCard discovery, task submission, and supporting tools like Code Execution Sandbox and Memory Bank. It enables building multi-agent systems with sequential, parallel, or loop orchestration patterns in Python, Java, or Go. Use it when asked to deploy ADK agents or orchestrate agent workflows on Google Cloud.

View skill