mcp-builder
About
mcp-builder helps developers create high-quality MCP servers for integrating external APIs and services with LLMs. Use it when building MCP integrations in Python (FastMCP) or Node/TypeScript (MCP SDK) to prioritize agent workflows over simple API wrappers. It guides you through research-driven design, implementation with validation, and evaluation-based iteration for context-efficient tools.
Documentation
MCP Server Development Guide
Overview
Build high-quality MCP (Model Context Protocol) servers that enable LLMs to accomplish real-world tasks through well-designed tools. Quality is measured not by API coverage, but by how effectively agents can use your tools to complete realistic workflows.
Core insight: MCP servers expose tools for AI agents, not human users. Design for agent constraints (limited context, no visual UI, workflow-oriented) rather than human convenience.
When to Use This Skill
Activate when:
- Building MCP servers for external API integration
- Adding tools to existing MCP servers
- Improving MCP server tool design for better agent usability
- Creating evaluations to test MCP server effectiveness
- Debugging why agents struggle with your MCP tools
Language Support:
- Python: FastMCP framework (recommended for rapid development)
- Node/TypeScript: MCP SDK (recommended for production services)
The Iron Law
DESIGN FOR AGENTS, NOT HUMANS
Every tool must optimize for:
- Context efficiency (agents have limited tokens)
- Workflow completion (not just API calls)
- Actionable errors (guide agents to success)
- Natural task subdivision (how agents think)
If your tools are just thin API wrappers, you're violating the Iron Law.
Core Principles
-
Agent-Centric Design First: Study design principles before coding. Tools should enable workflows, not mirror APIs.
-
Research-Driven Planning: Load MCP docs, SDK docs, and exhaustive API documentation before writing code.
-
Evaluation-Based Iteration: Create realistic evaluations early. Let agent feedback drive improvements.
-
Context Optimization: Every response token matters. Default to concise, offer detailed when needed.
-
Actionable Errors: Error messages should teach agents correct usage patterns.
Quick Start
Phase 1: Research and Planning (40% of effort)
- Study Design Principles: Load design_principles.md to understand agent-centric design
- Load Protocol Docs: Fetch
https://modelcontextprotocol.io/llms-full.txtfor MCP specification - Study SDK Docs: Load Python or TypeScript SDK documentation from GitHub
- Study API Exhaustively: Read ALL API documentation, endpoints, authentication, rate limits
- Create Implementation Plan: Define tools, shared utilities, pagination strategy, error handling
See workflow.md for complete Phase 1 steps.
Phase 2: Implementation (30% of effort)
- Setup Project: Create structure following language-specific guide
- Build Shared Utilities: API helpers, error handlers, formatters BEFORE tools
- Implement Tools: Use Pydantic (Python) or Zod (TypeScript) for validation
- Follow Best Practices: Load language-specific guide for patterns
See workflow.md for complete Phase 2 steps and language guides.
Phase 3: Review and Refine (15% of effort)
- Code Quality Review: Check DRY, composability, consistency, type safety
- Test Build: Verify syntax, imports, build process
- Quality Checklist: Use language-specific checklist
See workflow.md for complete Phase 3 steps.
Phase 4: Create Evaluations (15% of effort)
- Understand Purpose: Evaluations test if agents can answer realistic questions using your tools
- Create 10 Questions: Complex, read-only, independent, verifiable questions
- Verify Answers: Solve yourself to ensure stability and correctness
- Run Evaluation: Use provided scripts to test agent effectiveness
See evaluation.md for complete evaluation guidelines.
Navigation
Core Design and Workflow
-
🎯 Design Principles - Agent-centric design philosophy: workflows over APIs, context optimization, actionable errors, natural task subdivision. Read FIRST before implementation.
-
🔄 Complete Workflow - Detailed 4-phase development process with step-by-step instructions, decision trees, and when to load each reference file.
Universal MCP Guidelines
- 📋 MCP Best Practices - Naming conventions, response formats, pagination, character limits, security, tool annotations, error handling. Applies to all MCP servers.
Language-Specific Implementation
-
🐍 Python Implementation - FastMCP patterns, Pydantic validation, async/await, complete examples, quality checklist. Load during Phase 2 for Python servers.
-
⚡ TypeScript Implementation - MCP SDK patterns, Zod validation, project structure, complete examples, quality checklist. Load during Phase 2 for TypeScript servers.
Evaluation and Testing
- ✅ Evaluation Guide - Creating realistic questions, answer verification, XML format, running evaluations, interpreting results. Load during Phase 4.
Key Reminders
- Research First: Spend 40% of time researching before coding
- Agent-Centric: Design for AI workflows, not API completeness
- Context Efficient: Every token counts - default concise, offer detailed
- Actionable Errors: Guide agents to correct usage
- Shared Utilities: Extract common code - avoid duplication
- Evaluation-Driven: Create evals early, iterate based on feedback
- MCP Servers Block: Never run servers directly - use evaluation harness or tmux
Red Flags - STOP
If you catch yourself:
- "Just wrapping these API endpoints directly"
- "Returning all available data fields"
- "Error message just says what failed" (not how to fix)
- Starting implementation without reading design principles
- Coding before loading MCP protocol documentation
- Creating tools without knowing agent use cases
- Skipping evaluation creation
- Running
python server.pydirectly (will hang forever)
ALL of these mean: STOP. Return to design principles and workflow.
Integration with Other Skills
- systematic-debugging: Debug MCP server issues methodically
- test-driven-development: Create failing tests before implementation
- verification-before-completion: Verify build succeeds before claiming completion
- defense-in-depth: Add input validation at multiple layers
Real-World Impact
From MCP server development experience:
- Well-designed servers: 80-90% task completion rate by agents
- API wrapper approach: 30-40% task completion rate
- Context-optimized responses: 3x more information in same token budget
- Actionable errors: 60% reduction in agent retry attempts
- Evaluation-driven iteration: 2-3x improvement in agent success rate
Remember: The quality of an MCP server is measured by how well it enables LLMs to accomplish realistic tasks, not by how comprehensively it wraps an API.
Quick Install
/plugin add https://github.com/bobmatnyc/claude-mpm/tree/main/mcp-builderCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
