agent-creator

DNYoussef

Updated Today

100 views

Metaaiautomationdesign

About

The agent-creator skill provides a framework for building specialized, production-ready AI agents using a proven 4-phase SOP methodology. It combines evidence-based prompting techniques with the Claude Agent SDK to create agents with deeply embedded domain knowledge. Use it when developing agents for specific domains or workflows that require consistent, high-quality performance.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/DNYoussef/ai-chrome-extension

Git CloneAlternative

git clone https://github.com/DNYoussef/ai-chrome-extension.git ~/.claude/skills/agent-creator

Copy and paste this command in Claude Code to install this skill

Documentation

Agent Creator - Enhanced with 4-Phase SOP Methodology

This skill provides the official comprehensive framework for creating specialized AI agents, integrating the proven 4-phase methodology from Desktop .claude-flow with Claude Agent SDK implementation and evidence-based prompting techniques.

When to Use This Skill

Use agent-creator for:

Creating project-specialized agents with deeply embedded domain knowledge
Building agents for recurring tasks requiring consistent behavior
Rewriting existing agents to optimize performance
Creating multi-agent workflows with sequential or parallel coordination
Agents that will integrate with MCP servers and Claude Flow

The 4-Phase Agent Creation Methodology

Source: Desktop .claude-flow/ official SOP documentation Total Time: 2.5-4 hours per agent (first-time), 1.5-2 hours (speed-run)

This methodology was developed through systematic reverse engineering of fog-compute agent creation and validated through production use.

Phase 1: Initial Analysis & Intent Decoding (30-60 minutes)

Objective: Deep domain understanding through systematic research, not assumptions.

Activities:

Domain Breakdown
- What problem does this agent solve?
- What are the key challenges in this domain?
- What patterns do human experts use?
- What are common failure modes?
Technology Stack Mapping
- What tools, frameworks, libraries are used?
- What file types, formats, protocols?
- What integrations or APIs?
- What configuration patterns?
Integration Points
- What MCP servers will this agent use?
- What other agents will it coordinate with?
- What data flows in/out?
- What memory patterns needed?

Validation Gate:

Can describe domain in specific, technical terms
Identified 5+ key challenges
Mapped technology stack comprehensively
Clear on integration requirements

Outputs:

Domain analysis document
Technology stack inventory
Integration requirements list

Phase 2: Meta-Cognitive Extraction (30-45 minutes)

Objective: Identify the cognitive expertise domains activated when you reason about this agent's tasks.

Activities:

Expertise Domain Identification
- What knowledge domains are activated when you think about this role?
- What heuristics, patterns, rules-of-thumb?
- What decision-making frameworks?
- What quality standards?

Agent Specification Creation

# Agent Specification: [Name]

## Role & Expertise
- Primary role: [Specific title]
- Expertise domains: [List activated domains]
- Cognitive patterns: [Heuristics used]

## Core Capabilities
1. [Capability with specific examples]
2. [Capability with specific examples]
...

## Decision Frameworks
- When X, do Y because Z
- Always check A before B
- Never skip validation of C

## Quality Standards
- Output must meet [criteria]
- Performance measured by [metrics]
- Failure modes to prevent: [list]

Supporting Artifacts
- Create examples of good vs bad outputs
- Document edge cases
- List common pitfalls

Validation Gate:

Identified 3+ expertise domains
Documented 5+ decision heuristics
Created complete agent specification
Examples demonstrate quality standards

Outputs:

Agent specification document
Example outputs (good/bad)
Edge case inventory

Phase 3: Agent Architecture Design (45-60 minutes)

Objective: Transform specification into production-ready base system prompt.

Activities:

System Prompt Structure Design

# [AGENT NAME] - SYSTEM PROMPT v1.0

## 🎭 CORE IDENTITY

I am a **[Role Title]** with comprehensive, deeply-ingrained knowledge of [domain]. Through systematic reverse engineering and domain expertise, I possess precision-level understanding of:

- **[Domain Area 1]** - [Specific capabilities from Phase 2]
- **[Domain Area 2]** - [Specific capabilities from Phase 2]
- **[Domain Area 3]** - [Specific capabilities from Phase 2]

My purpose is to [primary objective] by leveraging [unique expertise].

## 📋 UNIVERSAL COMMANDS I USE

**File Operations**:
- /file-read, /file-write, /glob-search, /grep-search
WHEN: [Specific situations from domain analysis]
HOW: [Exact patterns]

**Git Operations**:
- /git-status, /git-commit, /git-push
WHEN: [Specific situations]
HOW: [Exact patterns]

**Communication & Coordination**:
- /memory-store, /memory-retrieve
- /agent-delegate, /agent-escalate
WHEN: [Specific situations]
HOW: [Exact patterns with namespace conventions]

## 🎯 MY SPECIALIST COMMANDS

[List role-specific commands with exact syntax and examples]

## 🔧 MCP SERVER TOOLS I USE

**Claude Flow MCP**:
- mcp__claude-flow__agent_spawn
  WHEN: [Specific coordination scenarios]
  HOW: [Exact function call patterns]

- mcp__claude-flow__memory_store
  WHEN: [Cross-agent data sharing]
  HOW: [Namespace pattern: agent-role/task-id/data-type]

**[Other relevant MCP servers from Phase 1]**

## 🧠 COGNITIVE FRAMEWORK

### Self-Consistency Validation
Before finalizing deliverables, I validate from multiple angles:
1. [Domain-specific validation 1]
2. [Domain-specific validation 2]
3. [Cross-check with standards]

### Program-of-Thought Decomposition
For complex tasks, I decompose BEFORE execution:
1. [Domain-specific decomposition pattern]
2. [Dependency analysis]
3. [Risk assessment]

### Plan-and-Solve Execution
My standard workflow:
1. PLAN: [Domain-specific planning]
2. VALIDATE: [Domain-specific validation]
3. EXECUTE: [Domain-specific execution]
4. VERIFY: [Domain-specific verification]
5. DOCUMENT: [Memory storage patterns]

## 🚧 GUARDRAILS - WHAT I NEVER DO

[From Phase 2 failure modes and edge cases]

**[Failure Category 1]**:
❌ NEVER: [Dangerous pattern]
WHY: [Consequences from domain knowledge]

WRONG:
  [Bad example]

CORRECT:
  [Good example]

## ✅ SUCCESS CRITERIA

Task complete when:
- [ ] [Domain-specific criterion 1]
- [ ] [Domain-specific criterion 2]
- [ ] [Domain-specific criterion 3]
- [ ] Results stored in memory
- [ ] Relevant agents notified

## 📖 WORKFLOW EXAMPLES

### Workflow 1: [Common Task Name from Phase 1]

**Objective**: [What this achieves]

**Step-by-Step Commands**:
```yaml
Step 1: [Action]
  COMMANDS:
    - /[command-1] --params
    - /[command-2] --params
  OUTPUT: [Expected]
  VALIDATION: [Check]

Step 2: [Next Action]
  COMMANDS:
    - /[command-3] --params
  OUTPUT: [Expected]
  VALIDATION: [Check]

Timeline: [Duration] Dependencies: [Prerequisites]

Evidence-Based Technique Integration

For each technique (from existing agent-creator skill):
- Self-consistency: When to use, how to apply
- Program-of-thought: Decomposition patterns
- Plan-and-solve: Planning frameworks
Integrate these naturally into the agent's methodology.
Quality Standards & Guardrails

From Phase 2 failure modes, create explicit guardrails:
- What patterns to avoid
- What validations to always run
- When to escalate vs. retry
- Error handling protocols

Validation Gate:

System prompt follows template structure
All Phase 2 expertise embedded
Evidence-based techniques integrated
Guardrails cover identified failure modes
2+ workflow examples with exact commands

Outputs:

Base system prompt (v1.0)
Cognitive framework specification
Guardrails documentation

Phase 4: Deep Technical Enhancement (60-90 minutes)

Objective: Reverse-engineer exact implementation patterns and document with precision.

Activities:

Code Pattern Extraction

For technical agents, extract EXACT patterns from codebase:

## Code Patterns I Recognize

### Pattern: [Name]
**File**: `path/to/file.py:123-156`

```python
class ExamplePattern:
    def __init__(
        self,
        param1: Type = default,  # Line 125: Exact default
        param2: Type = default   # Line 126: Exact default
    ):
        # Extracted from actual implementation
        pass

When I see this pattern, I know:

[Specific insight about architecture]
[Specific constraint or requirement]
[Common mistake to avoid]

Critical Failure Mode Documentation

From experience and domain knowledge:

## Critical Failure Modes

### Failure: [Name]
**Severity**: Critical/High/Medium
**Symptoms**: [How to recognize]
**Root Cause**: [Why it happens]
**Prevention**:
  ❌ DON'T: [Bad pattern]
  ✅ DO: [Good pattern with exact code]

**Detection**:
  ```bash
  # Exact command to detect this failure
  [command]

Integration Patterns

Document exact MCP tool usage:

## MCP Integration Patterns

### Pattern: Cross-Agent Data Sharing
```javascript
// Exact pattern for storing outputs
mcp__claude-flow__memory_store({
  key: "marketing-specialist/campaign-123/audience-analysis",
  value: {
    segments: [...],
    targeting: {...},
    confidence: 0.89
  },
  ttl: 86400
})

Namespace Convention:

Format: {agent-role}/{task-id}/{data-type}
Example: backend-dev/api-v2/schema-design

Performance Metrics

Define what to track:

## Performance Metrics I Track

```yaml
Task Completion:
  - /memory-store --key "metrics/[my-role]/tasks-completed" --increment 1
  - /memory-store --key "metrics/[my-role]/task-[id]/duration" --value [ms]

Quality:
  - validation-passes: [count successful validations]
  - escalations: [count when needed help]
  - error-rate: [failures / attempts]

Efficiency:
  - commands-per-task: [avg commands used]
  - mcp-calls: [tool usage frequency]

These metrics enable continuous improvement.

Validation Gate:

Code patterns include file/line references
Failure modes have detection + prevention
MCP patterns show exact syntax
Performance metrics defined
Agent can self-improve through metrics

Outputs:

Enhanced system prompt (v2.0)
Code pattern library
Failure mode handbook
Integration pattern guide
Metrics specification

Integrated Agent Creation Process

Combining 4-phase SOP with existing best practices:

Complete Workflow

Phase 1: Domain Analysis (30-60 min)
- Research domain systematically
- Map technology stack
- Identify integration points
- Output: Domain analysis doc
Phase 2: Expertise Extraction (30-45 min)
- Identify cognitive domains
- Create agent specification
- Document decision frameworks
- Output: Agent spec + examples
Phase 3: Architecture Design (45-60 min)
- Draft base system prompt
- Integrate evidence-based techniques
- Add quality guardrails
- Output: Base prompt v1.0
Phase 4: Technical Enhancement (60-90 min)
- Extract code patterns
- Document failure modes
- Define MCP integrations
- Add performance metrics
- Output: Enhanced prompt v2.0
SDK Implementation (30-60 min)
- Implement with Claude Agent SDK
- Configure tools and permissions
- Set up MCP servers
- Output: Production agent
Testing & Validation (30-45 min)
- Test typical cases
- Test edge cases
- Test error handling
- Verify consistency
- Output: Test report
Documentation & Packaging (15-30 min)
- Create agent README
- Document usage examples
- Package supporting files
- Output: Complete agent package

Total Time: 3.5-5.5 hours (first-time), 2-3 hours (speed-run)

Claude Agent SDK Implementation

Once system prompt is finalized, implement with SDK:

TypeScript Implementation

import { query, tool } from '@anthropic-ai/claude-agent-sdk';
import { z } from 'zod';

// Custom domain-specific tools
const domainTool = tool({
  name: 'domain_operation',
  description: 'Performs domain-specific operation',
  parameters: z.object({
    param: z.string()
  }),
  handler: async ({ param }) => {
    // Implementation from Phase 4
    return { result: 'data' };
  }
});

// Agent configuration
for await (const message of query('Perform domain task', {
  model: 'claude-sonnet-4-5',
  systemPrompt: enhancedPromptV2,  // From Phase 4
  permissionMode: 'acceptEdits',
  allowedTools: ['Read', 'Write', 'Bash', domainTool],
  mcpServers: [{
    command: 'npx',
    args: ['claude-flow@alpha', 'mcp', 'start'],
    env: { ... }
  }],
  settingSources: ['user', 'project']
})) {
  console.log(message);
}

Python Implementation

from claude_agent_sdk import query, tool, ClaudeAgentOptions
import asyncio

@tool()
async def domain_operation(param: str) -> dict:
    """Domain-specific operation from Phase 4."""
    # Implementation
    return {"result": "data"}

async def run_agent():
    options = ClaudeAgentOptions(
        model='claude-sonnet-4-5',
        system_prompt=enhanced_prompt_v2,  # From Phase 4
        permission_mode='acceptEdits',
        allowed_tools=['Read', 'Write', 'Bash', domain_operation],
        mcp_servers=[{
            'command': 'npx',
            'args': ['claude-flow@alpha', 'mcp', 'start']
        }],
        setting_sources=['user', 'project']
    )

    async for message in query('Perform domain task', **options):
        print(message)

asyncio.run(run_agent())

Agent Specialization Patterns

From existing agent-creator skill, enhanced with 4-phase methodology:

Analytical Agents

Phase 1 Focus: Evidence evaluation patterns, data quality standards Phase 2 Focus: Analytical heuristics, validation frameworks Phase 3 Focus: Self-consistency checking, confidence calibration Phase 4 Focus: Statistical validation code, error detection patterns

Generative Agents

Phase 1 Focus: Quality criteria, template patterns Phase 2 Focus: Creative heuristics, refinement cycles Phase 3 Focus: Plan-and-solve frameworks, requirement tracking Phase 4 Focus: Generation patterns, quality validation code

Diagnostic Agents

Phase 1 Focus: Problem patterns, debugging workflows Phase 2 Focus: Hypothesis generation, systematic testing Phase 3 Focus: Program-of-thought decomposition, evidence tracking Phase 4 Focus: Detection scripts, root cause analysis patterns

Orchestration Agents

Phase 1 Focus: Workflow patterns, dependency management Phase 2 Focus: Coordination heuristics, error recovery Phase 3 Focus: Plan-and-solve with dependencies, progress tracking Phase 4 Focus: Orchestration code, retry logic, escalation paths

Testing & Validation

From existing framework + SOP enhancements:

Test Suite Creation

Typical Cases - Expected behavior on common tasks
Edge Cases - Boundary conditions and unusual inputs
Error Cases - Graceful handling and escalation
Integration Cases - End-to-end workflow with other agents
Performance Cases - Speed, efficiency, resource usage

Validation Checklist

Quick Reference

When to Use Each Phase

Phase 1 (Analysis):

Always - Required foundation
Especially for domains you're less familiar with

Phase 2 (Expertise Extraction):

Always - Captures cognitive patterns
Essential for complex reasoning tasks

Phase 3 (Architecture):

Always - Creates base system prompt
Critical for clear behavioral specification

Phase 4 (Enhancement):

For production agents
For technical domains requiring exact patterns
When precision and failure prevention are critical

Speed-Run Approach (Experienced Creators)

Combined Phase 1+2 (30 min): Rapid domain analysis + spec
Phase 3 (30 min): Base prompt from template
Phase 4 (45 min): Code patterns + failure modes
Testing (15 min): Quick validation suite

Total: 2 hours for experienced creators with templates

Examples from Production

Example: Marketing Specialist Agent

See: docs/agent-architecture/agents-rewritten/MARKETING-SPECIALIST-AGENT.md

Phase 1 Output: Marketing domain analysis, tools (Google Analytics, SEMrush, etc.) Phase 2 Output: Marketing expertise (CAC, LTV, funnel optimization, attribution) Phase 3 Output: Base prompt with 9 specialist commands Phase 4 Output: Campaign workflow patterns, A/B test validation, ROI calculations

Result: Production-ready agent with deeply embedded marketing expertise

Maintenance & Iteration

Continuous Improvement

Metrics Review: Weekly review of agent performance metrics
Failure Analysis: Document and fix new failure modes
Pattern Updates: Add newly discovered code patterns
Workflow Optimization: Refine based on usage patterns

Version Control

v1.0: Base prompt from Phase 3
v1.x: Minor refinements from testing
v2.0: Enhanced with Phase 4 patterns
v2.x: Production iterations and improvements

Summary

This enhanced agent-creator skill combines:

✅ Official 4-phase SOP methodology (Desktop .claude-flow)
✅ Evidence-based prompting techniques (self-consistency, PoT, plan-and-solve)
✅ Claude Agent SDK implementation (TypeScript + Python)
✅ Production validation and testing frameworks
✅ Continuous improvement through metrics

Use this methodology to create all 90 specialist agents with:

Deeply embedded domain knowledge
Exact command and MCP tool specifications
Production-ready failure prevention
Measurable performance tracking

Next: Begin agent rewrites using this enhanced methodology.

GitHub Repository

DNYoussef/ai-chrome-extension

Path: .claude/skills/agent-creator

Related Skills

content-collections

creating-opencode-plugins

sglang

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill