deploying-monitoring-stacks

jeremylongshore

Updated Yesterday

37 views

712

Metadesigndata

About

This skill generates production-ready configurations for deploying monitoring stacks like Prometheus, Grafana, and Datadog. Use it when you need to set up metric collection, visualization dashboards, and alerting rules. It provides infrastructure-aware configurations for Kubernetes, Docker, or bare metal environments.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus

Git CloneAlternative

git clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/deploying-monitoring-stacks

Copy and paste this command in Claude Code to install this skill

Documentation

Prerequisites

Before using this skill, ensure:

Target infrastructure is identified (Kubernetes, Docker, bare metal)
Metric endpoints are accessible from monitoring platform
Storage backend is configured for time-series data
Alert notification channels are defined (email, Slack, PagerDuty)
Resource requirements are calculated based on scale

Instructions

Select Platform: Choose Prometheus/Grafana, Datadog, or hybrid approach
Deploy Collectors: Install exporters and agents on monitored systems
Configure Scraping: Define metric collection endpoints and intervals
Set Up Storage: Configure retention policies and data compaction
Create Dashboards: Build visualization panels for key metrics
Define Alerts: Create alerting rules with appropriate thresholds
Test Monitoring: Verify metrics flow and alert triggering

Output

Prometheus + Grafana (Kubernetes):

# {baseDir}/monitoring/prometheus.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=30d'
        ports:
        - containerPort: 9090

Grafana Dashboard Configuration:

{
  "dashboard": {
    "title": "Application Metrics",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m])"
          }
        ]
      }
    ]
  }
}

Error Handling

Metrics Not Appearing

Error: "No data points"
Solution: Verify scrape targets are accessible and returning metrics

High Cardinality

Error: "Too many time series"
Solution: Reduce label combinations or increase Prometheus resources

Alert Not Firing

Error: "Alert condition met but no notification"
Solution: Check Alertmanager configuration and notification channels

Dashboard Load Failure

Error: "Failed to load dashboard"
Solution: Verify Grafana datasource configuration and permissions

Resources

Prometheus documentation: https://prometheus.io/docs/
Grafana documentation: https://grafana.com/docs/
Example dashboards in {baseDir}/monitoring-examples/

GitHub Repository

jeremylongshore/claude-code-plugins-plus

Path: plugins/devops/monitoring-stack-deployer/skills/monitoring-stack-deployer

aiautomationclaude-codedevopsmarketplacemcp

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

creating-opencode-plugins

Meta

This skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill

polymarket

Meta

This skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.

View skill