distributed-tracing
About
This Claude Skill helps developers implement distributed tracing using Jaeger and Zipkin to track requests across microservices. It's particularly useful for debugging distributed systems, analyzing performance bottlenecks, and tracing request flows. The skill provides setup instructions and instrumentation examples for comprehensive tracing implementation.
Documentation
Distributed Tracing
Overview
Set up distributed tracing infrastructure with Jaeger or Zipkin to track requests across microservices and identify performance bottlenecks.
When to Use
- Debugging microservice interactions
- Identifying performance bottlenecks
- Tracking request flows
- Analyzing service dependencies
- Root cause analysis
Instructions
1. Jaeger Setup
# docker-compose.yml
version: '3.8'
services:
jaeger:
image: jaegertracing/all-in-one:latest
ports:
- "5775:5775/udp"
- "6831:6831/udp"
- "16686:16686"
- "14268:14268"
networks:
- tracing
networks:
tracing:
2. Node.js Jaeger Instrumentation
// tracing.js
const initTracer = require('jaeger-client').initTracer;
const opentracing = require('opentracing');
const initJaegerTracer = (serviceName) => {
const config = {
serviceName: serviceName,
sampler: {
type: 'const',
param: 1
},
reporter: {
logSpans: true,
agentHost: process.env.JAEGER_AGENT_HOST || 'localhost',
agentPort: process.env.JAEGER_AGENT_PORT || 6831
}
};
return initTracer(config, {});
};
const tracer = initJaegerTracer('api-service');
module.exports = { tracer };
3. Express Tracing Middleware
// middleware.js
const { tracer } = require('./tracing');
const opentracing = require('opentracing');
const tracingMiddleware = (req, res, next) => {
const wireCtx = tracer.extract(
opentracing.FORMAT_HTTP_HEADERS,
req.headers
);
const span = tracer.startSpan(req.path, {
childOf: wireCtx,
tags: {
[opentracing.Tags.SPAN_KIND]: opentracing.Tags.SPAN_KIND_RPC_SERVER,
[opentracing.Tags.HTTP_METHOD]: req.method,
[opentracing.Tags.HTTP_URL]: req.url
}
});
req.span = span;
res.on('finish', () => {
span.setTag(opentracing.Tags.HTTP_STATUS_CODE, res.statusCode);
span.finish();
});
next();
};
module.exports = tracingMiddleware;
4. Python Jaeger Integration
# tracing.py
from jaeger_client import Config
from opentracing.propagation import Format
def init_jaeger_tracer(service_name):
config = Config(
config={
'sampler': {'type': 'const', 'param': 1},
'local_agent': {
'reporting_host': 'localhost',
'reporting_port': 6831,
},
'logging': True,
},
service_name=service_name,
)
return config.initialize_tracer()
# Flask integration
from flask import Flask, request
app = Flask(__name__)
tracer = init_jaeger_tracer('api-service')
@app.before_request
def before_request():
ctx = tracer.extract(Format.HTTP_HEADERS, request.headers)
request.span = tracer.start_span(
request.path,
child_of=ctx,
tags={
'http.method': request.method,
'http.url': request.url,
}
)
@app.after_request
def after_request(response):
request.span.set_tag('http.status_code', response.status_code)
request.span.finish()
return response
@app.route('/api/users/<user_id>')
def get_user(user_id):
with tracer.start_span('fetch-user', child_of=request.span) as span:
span.set_tag('user.id', user_id)
# Fetch user from database
return {'user': {'id': user_id}}
5. Distributed Context Propagation
// propagation.js
const axios = require('axios');
const { tracer } = require('./tracing');
const opentracing = require('opentracing');
async function callDownstreamService(span, url, data) {
const headers = {};
// Inject trace context
tracer.inject(span, opentracing.FORMAT_HTTP_HEADERS, headers);
try {
const response = await axios.post(url, data, { headers });
span.setTag('downstream.success', true);
return response.data;
} catch (error) {
span.setTag(opentracing.Tags.ERROR, true);
span.log({
event: 'error',
message: error.message
});
throw error;
}
}
module.exports = { callDownstreamService };
6. Zipkin Integration
// zipkin-setup.js
const CLSContext = require('zipkin-context-cls');
const { Tracer, BatchRecorder, HttpLogger } = require('zipkin');
const zipkinMiddleware = require('zipkin-instrumentation-express').expressMiddleware;
const recorder = new BatchRecorder({
logger: new HttpLogger({
endpoint: 'http://localhost:9411/api/v2/spans',
headers: { 'Content-Type': 'application/json' }
})
});
const ctxImpl = new CLSContext('zipkin');
const tracer = new Tracer({ recorder, ctxImpl });
module.exports = {
tracer,
zipkinMiddleware: zipkinMiddleware({
tracer,
serviceName: 'api-service'
})
};
7. Trace Analysis
# query-traces.py
import requests
def query_traces(service_name, operation=None, limit=20):
params = {
'service': service_name,
'limit': limit
}
if operation:
params['operation'] = operation
response = requests.get('http://localhost:16686/api/traces', params=params)
return response.json()['data']
def find_slow_traces(service_name, min_duration_ms=1000):
traces = query_traces(service_name, limit=100)
slow_traces = [
t for t in traces
if t['duration'] > min_duration_ms * 1000
]
return sorted(slow_traces, key=lambda t: t['duration'], reverse=True)
Best Practices
✅ DO
- Sample appropriately for your traffic volume
- Propagate trace context across services
- Add meaningful span tags
- Log errors with spans
- Use consistent service naming
- Monitor trace latency
- Document trace format
- Keep instrumentation lightweight
❌ DON'T
- Sample 100% in production
- Skip trace context propagation
- Log sensitive data in spans
- Create excessive spans
- Ignore sampling configuration
- Use unbounded cardinality tags
- Deploy without testing collection
Key Concepts
- Trace: Complete request flow across services
- Span: Single operation within a trace
- Tag: Metadata attached to spans
- Log: Timestamped events within spans
- Context: Trace information propagated between services
Quick Install
/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/distributed-tracingCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
subagent-driven-development
DevelopmentThis skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.
algorithmic-art
MetaThis Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.
executing-plans
DesignUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
cost-optimization
OtherThis Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.
