prometheus-monitoring
About
This Claude Skill helps developers set up Prometheus monitoring infrastructure for collecting and querying time-series metrics. It enables custom metrics creation, scraping configurations, and service discovery for applications. Use it when implementing observability features or building comprehensive monitoring systems.
Documentation
Prometheus Monitoring
Overview
Implement comprehensive Prometheus monitoring infrastructure for collecting, storing, and querying time-series metrics from applications and infrastructure.
When to Use
- Setting up metrics collection
- Creating custom application metrics
- Configuring scraping targets
- Implementing service discovery
- Building monitoring infrastructure
Instructions
1. Prometheus Configuration
# /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: production
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093']
rule_files:
- '/etc/prometheus/alert_rules.yml'
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
- job_name: 'api-service'
static_configs:
- targets: ['localhost:8080/metrics']
scrape_interval: 10s
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: 'true'
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
2. Node.js Metrics Implementation
// metrics.js
const promClient = require('prom-client');
const register = new promClient.Registry();
promClient.collectDefaultMetrics({ register });
const httpRequestDuration = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.1, 0.5, 1, 2, 5],
registers: [register]
});
const requestsTotal = new promClient.Counter({
name: 'requests_total',
help: 'Total requests',
labelNames: ['method', 'route', 'status_code'],
registers: [register]
});
// Express middleware
const express = require('express');
const app = express();
app.get('/metrics', (req, res) => {
res.set('Content-Type', register.contentType);
res.end(register.metrics());
});
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration
.labels(req.method, req.path, res.statusCode)
.observe(duration);
requestsTotal
.labels(req.method, req.path, res.statusCode)
.inc();
});
next();
});
module.exports = { register, httpRequestDuration, requestsTotal };
3. Python Prometheus Integration
from prometheus_client import Counter, Histogram, start_http_server
from flask import Flask, request
import time
app = Flask(__name__)
request_count = Counter('requests_total', 'Total requests', ['method', 'endpoint'])
request_duration = Histogram('request_duration_seconds', 'Request duration', ['method', 'endpoint'])
@app.before_request
def before():
request.start_time = time.time()
@app.after_request
def after(response):
duration = time.time() - request.start_time
request_count.labels(request.method, request.path).inc()
request_duration.labels(request.method, request.path).observe(duration)
return response
if __name__ == '__main__':
start_http_server(8000)
app.run(port=5000)
4. Alert Rules
# /etc/prometheus/alert_rules.yml
groups:
- name: application
rules:
- alert: HighErrorRate
expr: rate(requests_total{status_code=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate: {{ $value }}"
- alert: HighLatency
expr: histogram_quantile(0.95, request_duration_seconds) > 1
for: 10m
labels:
severity: warning
annotations:
summary: "p95 latency: {{ $value }}s"
- alert: HighMemoryUsage
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes < 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Low memory: {{ $value }}"
5. Docker Compose Setup
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./alert_rules.yml:/etc/prometheus/alert_rules.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=30d'
node-exporter:
image: prom/node-exporter:latest
ports:
- "9100:9100"
volumes:
prometheus_data:
Best Practices
✅ DO
- Use consistent metric naming conventions
- Add comprehensive labels for filtering
- Set appropriate scrape intervals (10-60s)
- Implement retention policies
- Monitor Prometheus itself
- Test alert rules before deployment
- Document metric meanings
❌ DON'T
- Add unbounded cardinality labels
- Scrape too frequently (< 10s)
- Ignore metric naming conventions
- Create alerts without runbooks
- Store raw event data in Prometheus
- Use counters for gauge-like values
Key Prometheus Queries
rate(requests_total[5m]) # Request rate
histogram_quantile(0.95, request_duration_seconds) # p95 latency
rate(requests_total{status_code=~"5.."}[5m]) # Error rate
Quick Install
/plugin add https://github.com/aj-geddes/useful-ai-prompts/tree/main/prometheus-monitoringCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
Algorithmic Art Generation
MetaThis skill helps developers create algorithmic art using p5.js, focusing on generative art, computational aesthetics, and interactive visualizations. It automatically activates for topics like "generative art" or "p5.js visualization" and guides you through creating unique algorithms with features like seeded randomness, flow fields, and particle systems. Use it when you need to build reproducible, code-driven artistic patterns.
webapp-testing
TestingThis Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.
