data-refresh-eval
About
This skill builds and updates evaluation datasets by pulling customer support conversations from Front. It enables developers to run routing evaluations and analyze agent response quality against those datasets. Use it to maintain fresh test data and continuously assess your support system's performance.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-refresh-evalCopy and paste this command in Claude Code to install this skill
Documentation
Data Refresh & Eval Skill
Workflow for keeping the eval dataset fresh and running quality checks on agent responses.
Quick Start
cd ~/Code/skillrecordings/support/packages/cli
# Refresh dataset from Front (last 30 days, 200 responses max)
bun src/index.ts dataset build --since $(date -d "30 days ago" +%Y-%m-%d) --limit 200 --output data/eval-dataset.json
# Run routing eval
bun src/index.ts eval routing data/eval-dataset.json
Dataset Commands
Build fresh dataset
# Recent data (recommended for ongoing work)
bun src/index.ts dataset build --since 2025-01-01 --limit 200 --output data/eval-dataset.json
# App-specific
bun src/index.ts dataset build --app total-typescript --limit 100 --output data/tt-dataset.json
# Include conversation history for context
bun src/index.ts dataset build --since 2025-01-01 --include-history --output data/dataset-with-history.json
# Only labeled responses (good/bad)
bun src/index.ts dataset build --labeled-only --output data/labeled-only.json
Convert to evalite format
bun src/index.ts dataset to-evalite -i data/eval-dataset.json -o data/evalite-format.json
Running Evals
Routing eval (default thresholds)
bun src/index.ts eval routing data/eval-dataset.json
Custom thresholds
bun src/index.ts eval routing data/eval-dataset.json \
--min-precision 0.95 \
--min-recall 0.98 \
--max-fp-rate 0.02 \
--max-fn-rate 0.01
JSON output for CI/automation
bun src/index.ts eval routing data/eval-dataset.json --json
Response Analysis
Find bad responses for debugging
# List responses rated "bad"
bun src/index.ts responses list --rating bad
# Get details with conversation context
bun src/index.ts responses get <actionId> --context
# Export bad responses for analysis
bun src/index.ts responses export --rating bad -o bad-responses.json
Analyze unrated responses
bun src/index.ts responses list --rating unrated --limit 50
Recommended Workflow
Daily data refresh
cd ~/Code/skillrecordings/support/packages/cli
# 1. Pull fresh data
bun src/index.ts dataset build --since $(date -d "7 days ago" +%Y-%m-%d) --limit 100 --output data/eval-dataset.json
# 2. Check dataset stats
cat data/eval-dataset.json | jq 'length'
# 3. Run eval
bun src/index.ts eval routing data/eval-dataset.json
# 4. Check for failures
bun src/index.ts responses list --rating bad --limit 10
Pre-deploy validation
# 1. Build comprehensive dataset
bun src/index.ts dataset build --since 2025-01-01 --limit 500 --output data/full-dataset.json
# 2. Run eval with strict thresholds
bun src/index.ts eval routing data/full-dataset.json --min-precision 0.95 --min-recall 0.98 --json
# 3. Check exit code
echo "Exit code: $?"
Dataset Schema
Each eval point includes:
id- Action IDapp- App slug (total-typescript, aihero, etc.)conversationId- Front conversation IDcustomerEmail- Customer email (if available)triggerMessage- The inbound message that triggered the responsesubject,body,timestamp
agentResponse- The agent's drafted responsetext,category,timestamp
label- "good" | "bad" | undefinedlabeledBy- Who approved/rejectedconversationHistory- (optional) Full message history
Environment
Required in .env.local:
FRONT_API_TOKEN= # Front API access
DATABASE_URL= # Database connection
Troubleshooting
"FRONT_API_TOKEN environment variable required"
source apps/front/.env.local
# or set in .env.local at repo root
Dataset building slowly
Front API rate limits. Use --limit to control batch size.
No labeled data
Labels come from HITL approvals/rejections. New responses start unlabeled.
GitHub Repository
Related Skills
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
creating-opencode-plugins
MetaThis skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
Algorithmic Art Generation
MetaThis skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.
