data-refresh-eval

majiayu000

Updated Yesterday

1 views

Metadesigndata

About

This skill builds and updates evaluation datasets by pulling customer support conversations from Front. It enables developers to run routing evaluations and analyze agent response quality against those datasets. Use it to maintain fresh test data and continuously assess your support system's performance.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/majiayu000/claude-skill-registry

Git CloneAlternative

git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-refresh-eval

Copy and paste this command in Claude Code to install this skill

Documentation

Data Refresh & Eval Skill

Workflow for keeping the eval dataset fresh and running quality checks on agent responses.

Quick Start

cd ~/Code/skillrecordings/support/packages/cli

# Refresh dataset from Front (last 30 days, 200 responses max)
bun src/index.ts dataset build --since $(date -d "30 days ago" +%Y-%m-%d) --limit 200 --output data/eval-dataset.json

# Run routing eval
bun src/index.ts eval routing data/eval-dataset.json

Dataset Commands

Build fresh dataset

# Recent data (recommended for ongoing work)
bun src/index.ts dataset build --since 2025-01-01 --limit 200 --output data/eval-dataset.json

# App-specific
bun src/index.ts dataset build --app total-typescript --limit 100 --output data/tt-dataset.json

# Include conversation history for context
bun src/index.ts dataset build --since 2025-01-01 --include-history --output data/dataset-with-history.json

# Only labeled responses (good/bad)
bun src/index.ts dataset build --labeled-only --output data/labeled-only.json

Convert to evalite format

bun src/index.ts dataset to-evalite -i data/eval-dataset.json -o data/evalite-format.json

Running Evals

Routing eval (default thresholds)

bun src/index.ts eval routing data/eval-dataset.json

Custom thresholds

bun src/index.ts eval routing data/eval-dataset.json \
  --min-precision 0.95 \
  --min-recall 0.98 \
  --max-fp-rate 0.02 \
  --max-fn-rate 0.01

JSON output for CI/automation

bun src/index.ts eval routing data/eval-dataset.json --json

Response Analysis

Find bad responses for debugging

# List responses rated "bad"
bun src/index.ts responses list --rating bad

# Get details with conversation context
bun src/index.ts responses get <actionId> --context

# Export bad responses for analysis
bun src/index.ts responses export --rating bad -o bad-responses.json

Analyze unrated responses

bun src/index.ts responses list --rating unrated --limit 50

Recommended Workflow

Daily data refresh

cd ~/Code/skillrecordings/support/packages/cli

# 1. Pull fresh data
bun src/index.ts dataset build --since $(date -d "7 days ago" +%Y-%m-%d) --limit 100 --output data/eval-dataset.json

# 2. Check dataset stats
cat data/eval-dataset.json | jq 'length'

# 3. Run eval
bun src/index.ts eval routing data/eval-dataset.json

# 4. Check for failures
bun src/index.ts responses list --rating bad --limit 10

Pre-deploy validation

# 1. Build comprehensive dataset
bun src/index.ts dataset build --since 2025-01-01 --limit 500 --output data/full-dataset.json

# 2. Run eval with strict thresholds
bun src/index.ts eval routing data/full-dataset.json --min-precision 0.95 --min-recall 0.98 --json

# 3. Check exit code
echo "Exit code: $?"

Dataset Schema

Each eval point includes:

id - Action ID
app - App slug (total-typescript, aihero, etc.)
conversationId - Front conversation ID
customerEmail - Customer email (if available)
triggerMessage - The inbound message that triggered the response
- subject, body, timestamp
agentResponse - The agent's drafted response
- text, category, timestamp
label - "good" | "bad" | undefined
labeledBy - Who approved/rejected
conversationHistory - (optional) Full message history

Environment

Required in .env.local:

FRONT_API_TOKEN=          # Front API access
DATABASE_URL=             # Database connection

Troubleshooting

"FRONT_API_TOKEN environment variable required"

source apps/front/.env.local
# or set in .env.local at repo root

Dataset building slowly

Front API rate limits. Use --limit to control batch size.

No labeled data

Labels come from HITL approvals/rejections. New responses start unlabeled.

GitHub Repository

majiayu000/claude-skill-registry

Path: skills/data-refresh-eval

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

creating-opencode-plugins

Meta

This skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill

Algorithmic Art Generation

Meta

This skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.

View skill