Evals
Über
Evals ist ein Agenten-Evaluierungs-Framework zum Testen und Benchmarking von Claude Code-Agenten basierend auf den Best Practices von Anthropic. Es bietet drei Bewertungstypen (codebasiert, modellbasiert, menschlich), Transkript-Erfassung und Pass@k-Metriken für Regressions- und Fähigkeitstests. Nutzen Sie diese Fähigkeit, wenn Sie Agentenverhalten evaluieren, verifizieren oder benchmarken müssen.
Schnellinstallation
Claude Code
Empfohlennpx skills add majiayu000/claude-skill-registry -a claude-code/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/EvalsKopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren
GitHub Repository
Verwandte Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
evaluating-llms-harness
TestenThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
cloudflare-turnstile
MetaThis skill provides comprehensive guidance for implementing Cloudflare Turnstile as a CAPTCHA-alternative bot protection system. It covers integration for forms, login pages, API endpoints, and frameworks like React/Next.js/Hono, while handling invisible challenges that maintain user experience. Use it when migrating from reCAPTCHA, debugging error codes, or implementing token validation and E2E tests.
cloudflare-cron-triggers
TestenThis skill provides comprehensive knowledge for implementing Cloudflare Cron Triggers to schedule Workers using cron expressions. It covers setting up periodic tasks, maintenance jobs, and automated workflows while handling common issues like invalid cron expressions and timezone problems. Developers can use it for configuring scheduled handlers, testing cron triggers, and integrating with Workflows and Green Compute.
