Zurück zu Fähigkeiten

harness:setup

raphaelchristi
Aktualisiert 5 days ago
27
4
27
Auf GitHub ansehen
Andereai

Über

Diese Fähigkeit richtet den Harness Evolver v3 in einem Projekt ein, indem sie die Codebasis untersucht und LangSmith für die Agentenoptimierung konfiguriert. Sie führt eine Baseline-Evaluierung durch, um die Leistung des LLM-Agenten zu verbessern, und wird ausgelöst, wenn Benutzer erstmals "Evolver" in einem Projekt erwähnen, ohne dass bereits eine .evolver.json-Datei existiert. Das Setup umfasst die Überprüfung auf LangSmith-API-Schlüssel und die Vorbereitung der Umgebung für die Agentenentwicklung.

Schnellinstallation

Claude Code

Empfohlen
Primär
npx skills add raphaelchristi/harness-evolver -a claude-code
Plugin-BefehlAlternativ
/plugin add https://github.com/raphaelchristi/harness-evolver
Git CloneAlternativ
git clone https://github.com/raphaelchristi/harness-evolver.git ~/.claude/skills/harness:setup

Kopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren

Dokumentation

/harness:setup

Set up the Harness Evolver v3 in a project. Explores the codebase, configures LangSmith, runs baseline evaluation.

Prerequisites

Check for LangSmith API key — it can be in the environment, the credentials file, or .env:

python3 -c "
import os, platform
key = os.environ.get('LANGSMITH_API_KEY', '')
if not key:
    creds = os.path.expanduser('~/Library/Application Support/langsmith-cli/credentials') if platform.system() == 'Darwin' else os.path.expanduser('~/.config/langsmith-cli/credentials')
    if os.path.exists(creds):
        for line in open(creds):
            if line.strip().startswith('LANGSMITH_API_KEY='):
                key = line.strip().split('=',1)[1].strip()
    if not key and os.path.exists('.env'):
        for line in open('.env'):
            if line.strip().startswith('LANGSMITH_API_KEY=') and not line.strip().startswith('#'):
                key = line.strip().split('=',1)[1].strip().strip('\"').strip(\"'\")
print('OK' if key else 'MISSING')
"

If MISSING: "Set your LangSmith API key: export LANGSMITH_API_KEY=lsv2_pt_... or run npx harness-evolver@latest to configure."

The tools auto-load the key from the credentials file, but the env var takes precedence.

Resolve Tool Path and Python

# Prefer env vars set by plugin hook; fallback to legacy npx paths
TOOLS="${EVOLVER_TOOLS:-$([ -d ".evolver/tools" ] && echo ".evolver/tools" || echo "$HOME/.evolver/tools")}"
EVOLVER_PY="${EVOLVER_PY:-$([ -f "$HOME/.evolver/venv/bin/python" ] && echo "$HOME/.evolver/venv/bin/python" || echo "python3")}"

Use $EVOLVER_PY instead of python3 for ALL tool invocations. This ensures the venv with langsmith is used.

IMPORTANT: Never pass LANGSMITH_API_KEY inline in Bash commands. The key is loaded automatically by the SessionStart hook (from credentials file or environment) and by each Python tool's ensure_langsmith_api_key(). Passing it inline exposes it in the output. If the key is missing, tell the user to run export LANGSMITH_API_KEY=lsv2_pt_... instead.

Phase 1: Explore Project (automatic)

find . -maxdepth 3 -type f -name "*.py" -not -path "*/.venv/*" -not -path "*/node_modules/*" -not -path "*/__pycache__/*" | head -30

Monorepo detection: if the project root has multiple subdirectories with their own main.py or pyproject.toml, it's a monorepo. Use AskUserQuestion to ask WHICH app to optimize before proceeding — do NOT scan everything.

Look for:

  • Entry points: files with if __name__, or named main.py, app.py, agent.py, graph.py, pipeline.py
  • Existing LangSmith config: LANGCHAIN_PROJECT / LANGSMITH_PROJECT in env or .env
  • Existing test data: JSON files with inputs, CSV files, etc.
  • Dependencies: requirements.txt, pyproject.toml

To identify the framework, read the entry point file and its immediate imports. The proposer agents will use Context7 MCP for detailed documentation lookup — you don't need to detect every library, just identify the main framework (LangGraph, CrewAI, OpenAI Agents SDK, etc.) from the imports you see.

Detect virtual environments — check for venvs in the project or parent directories:

# Check common venv locations
for venv_dir in .venv venv ../.venv ../venv; do
    if [ -f "$venv_dir/bin/python" ]; then
        echo "VENV_FOUND: $venv_dir/bin/python"
        break
    fi
done

If a venv is found, use it for the entry point instead of bare python. The agent's dependencies are likely installed there, not in the system Python. For example: ../.venv/bin/python agent.py {input} instead of python agent.py {input}.

Identify the run command — how to execute the agent. Use {input} as a placeholder for the JSON file path:

  • .venv/bin/python main.py {input} — if venv detected (preferred)
  • python main.py {input} — agent reads JSON file from positional arg
  • python main.py --input {input} — agent reads JSON file from --input flag
  • python main.py --query {input_json} — agent receives inline JSON string

The runner writes {"input": "user question..."} to a temp .json file and replaces {input} with the file path. If the entry point already contains --input (without placeholder), the runner appends the file path as the next argument.

If no placeholder and no --input flag detected, the runner appends --input <path> --output <path>.

Phase 2: Confirm Configuration (interactive)

Present all detected configuration in one view with smart defaults and ask for confirmation.

Use AskUserQuestion:

{
  "questions": [{
    "question": "Here's the configuration for your project:\n\n**Entry point**: {command}\n**Framework**: {framework}\n**Python**: {venv_path or 'system python3'}\n**Optimization goals**: accuracy (correctness evaluator)\n**Test data**: generate 30 examples with AI\n\nDoes this look good?",
    "header": "Setup Configuration",
    "multiSelect": false,
    "options": [
      {"label": "Looks good, proceed", "description": "Use these settings and start setup"},
      {"label": "Customize goals", "description": "Choose different optimization goals"},
      {"label": "I have test data", "description": "Use existing JSON file or LangSmith project"},
      {"label": "Let me adjust everything", "description": "Change entry point, framework, goals, and data source"}
    ]
  }]
}

If "Looks good, proceed": Use defaults — goals=accuracy, data=generate 30 with testgen. Skip straight to Phase 3.

If "Customize goals": Ask the goals question, then proceed to Phase 3 with testgen as default data source.

Use AskUserQuestion:

{
  "questions": [{
    "question": "What do you want to optimize?",
    "header": "Goals",
    "multiSelect": true,
    "options": [
      {"label": "Accuracy", "description": "Correctness of outputs — LLM-as-judge evaluator"},
      {"label": "Latency", "description": "Response time — track and minimize"},
      {"label": "Token efficiency", "description": "Fewer tokens for same quality"},
      {"label": "Error handling", "description": "Reduce failures, timeouts, crashes"}
    ]
  }]
}

Map selections to evaluator configuration for setup.py.

Phase 2.5: Mode Selection

{
  "questions": [{
    "question": "Evolution mode?",
    "header": "Mode",
    "multiSelect": false,
    "options": [
      {"label": "light", "description": "20 examples, 2 proposers, ~2 min/iter. Good for testing."},
      {"label": "balanced (Recommended)", "description": "30 examples, 3 proposers, ~8 min/iter. Best trade-off."},
      {"label": "heavy", "description": "50 examples, 5 proposers, ~25 min/iter. Maximum quality."}
    ]
  }]
}

Pass selection to setup.py as --mode light|balanced|heavy.

The mode determines testgen count:

  • light: generate 20 examples
  • balanced: generate 30 examples (default, current behavior)
  • heavy: generate 50 examples

If "I have test data": Ask the data source question, then proceed to Phase 3 with accuracy as default goal.

Use AskUserQuestion with preview:

{
  "questions": [{
    "question": "Where should test inputs come from?",
    "header": "Test data",
    "multiSelect": false,
    "options": [
      {
        "label": "Import from LangSmith",
        "description": "Use real production traces as test inputs",
        "preview": "## Import from LangSmith\n\nFetches up to 100 recent traces from your production project.\nPrioritizes traces with negative feedback.\nCreates a LangSmith Dataset with real user inputs.\n\nRequires: an existing LangSmith project with traces."
      },
      {
        "label": "I have a file",
        "description": "Point to an existing file with test inputs",
        "preview": "## Provide Test Data\n\nSupported formats:\n- JSON array of inputs\n- JSON with {\"inputs\": {...}} objects\n- CSV with input columns\n\nExample:\n```json\n[\n  {\"input\": \"What is Python?\"},\n  {\"input\": \"Explain quantum computing\"}\n]\n```"
      }
    ]
  }]
}

If "Import from LangSmith": discover projects and ask which one (same as v2 Phase 1.9). If "I have a file": ask for file path.

If "Let me adjust everything": Ask all three original questions in sequence — confirm detection (entry point, framework, run command), then goals, then data source — using the question formats above.

Phase 3: Run Setup

Build the setup.py command based on all gathered information:

$EVOLVER_PY $TOOLS/setup.py \
    --project-name "{project_name}" \
    --entry-point "{run_command}" \
    --framework "{framework}" \
    --goals "{goals_csv}" \
    ${DATASET_FROM_FILE:+--dataset-from-file "$DATASET_FROM_FILE"} \
    ${DATASET_FROM_LANGSMITH:+--dataset-from-langsmith "$DATASET_FROM_LANGSMITH"} \
    ${PRODUCTION_PROJECT:+--production-project "$PRODUCTION_PROJECT"}

If "Generate from code" was selected AND no test data file exists, first spawn the testgen agent to generate inputs, then pass the generated file to setup.py.

Phase 4: Generate Test Data (if needed)

If testgen is needed, spawn it:

Agent(
  subagent_type: "harness-testgen",
  description: "TestGen: generate test inputs",
  prompt: |
    <objective>
    Generate 30 diverse test inputs for this project.
    Write them as a JSON array to test_inputs.json.
    </objective>

    <files_to_read>
    {all .py files discovered in Phase 1}
    </files_to_read>

    <output>
    Create test_inputs.json with format:
    [{"input": "..."}, {"input": "..."}, ...]
    </output>
)

Then pass --dataset-from-file test_inputs.json to setup.py.

Phase 5: Report

Setup complete!
  Project: evolver-{name}
  Dataset: {name}-eval-v1 ({N} examples)
  Evaluators: {list}
  Baseline score: {score}
  Config: .evolver.json

Next: run /harness:evolve to start optimizing.

Gotchas

  • If .evolver.json already exists, ask before overwriting.
  • If the agent needs a venv, the run command should activate it: cd {dir} && .venv/bin/python main.py
  • No project venv detected: Before building the entry_point, check if .venv/bin/python or venv/bin/python exists in the project directory. If NOT, warn the user:
    WARNING: No Python venv found in this project (.venv/ or venv/).
    The entry_point should use the project's own Python, not ~/.evolver/venv/bin/python.
    Create one first: python3 -m venv .venv && .venv/bin/pip install -r requirements.txt
    
    Do NOT use ~/.evolver/venv/bin/python as the entry_point — that's for evolver tools only. Using it pollutes the evolver venv with agent dependencies.
  • If LangSmith connection fails, check API key and network. The ensure_langsmith_api_key() function validates key format and rejects dummy/test keys with a warning.
  • Eval concurrency defaults to 3 (runs 3 examples in parallel). If the agent can't handle parallel execution (writes to shared files, uses a fixed port, holds a DB lock), set eval_concurrency: 1 in .evolver.json after setup.
  • Companion plugin: For full proposer observability, recommend installing langsmith-tracing from langchain-ai/langsmith-claude-code-plugins. Each proposer's file reads, edits, and commits become visible in LangSmith.

GitHub Repository

raphaelchristi/harness-evolver
Pfad: skills/setup
0
agent-evolutionclaude-code-plugincodex-skillsharness-engineeringmeta-harness

Verwandte Skills

llamaguard

Andere

LlamaGuard ist Metas 7-8B-Parameter-Modell zur Moderation von LLM-Eingaben und -Ausgaben in sechs Sicherheitskategorien wie Gewalt und Hassrede. Es bietet eine Genauigkeit von 94-95 % und kann mit vLLM, Hugging Face oder Amazon SageMaker eingesetzt werden. Nutzen Sie diese Skill, um Inhaltsfilterung und Sicherheitsguardrails einfach in Ihre KI-Anwendungen zu integrieren.

Skill ansehen

cost-optimization

Andere

Diese Claude Skill unterstützt Entwickler bei der Optimierung von Cloud-Kosten durch Ressourcen-Dimensionierung, Tagging-Strategien und Ausgabenanalysen. Sie bietet einen Rahmen zur Senkung von Cloud-Ausgaben und zur Implementierung von Kosten-Governance für AWS, Azure und GCP. Nutzen Sie sie, wenn Sie Infrastrukturkosten analysieren, Ressourcen richtig dimensionieren oder Budgetvorgaben einhalten müssen.

Skill ansehen

quantizing-models-bitsandbytes

Andere

Diese Fähigkeit quantisiert LLMs auf 8-Bit- oder 4-Bit-Präzision mittels bitsandbytes und erreicht dabei eine Speicherreduzierung von 50–75 % bei minimalem Genauigkeitsverlust. Sie ist ideal für den Betrieb größerer Modelle mit begrenztem GPU-Speicher oder zur Beschleunigung von Inferenzvorgängen und unterstützt Formate wie INT8, NF4 und FP4. Die Fähigkeit integriert sich in HuggingFace Transformers und ermöglicht QLoRA-Training sowie 8-Bit-Optimierer.

Skill ansehen

dispatching-parallel-agents

Andere

Diese Claude-Fähigkeit verteilt mehrere Agenten, um drei oder mehr unabhängige Probleme gleichzeitig zu untersuchen und zu beheben. Sie ist für Szenarien konzipiert, die unabhängige Fehler umfassen, die ohne gemeinsamen Zustand oder Abhängigkeiten gelöst werden können. Die Kernfähigkeit ist die parallele Problemlösung, bei der pro unabhängigem Problembereich ein Agent zugewiesen wird, um die Effizienz zu maximieren.

Skill ansehen