Back to Skills

deploying-monitoring-stacks

jeremylongshore
Updated Yesterday
37 views
712
74
712
View on GitHub
Metadesigndata

About

This skill generates production-ready configurations for deploying monitoring stacks like Prometheus, Grafana, and Datadog. Use it when you need to set up metric collection, visualization dashboards, and alerting rules. It provides infrastructure-aware configurations for Kubernetes, Docker, or bare metal environments.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus
Git CloneAlternative
git clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/deploying-monitoring-stacks

Copy and paste this command in Claude Code to install this skill

Documentation

Prerequisites

Before using this skill, ensure:

  • Target infrastructure is identified (Kubernetes, Docker, bare metal)
  • Metric endpoints are accessible from monitoring platform
  • Storage backend is configured for time-series data
  • Alert notification channels are defined (email, Slack, PagerDuty)
  • Resource requirements are calculated based on scale

Instructions

  1. Select Platform: Choose Prometheus/Grafana, Datadog, or hybrid approach
  2. Deploy Collectors: Install exporters and agents on monitored systems
  3. Configure Scraping: Define metric collection endpoints and intervals
  4. Set Up Storage: Configure retention policies and data compaction
  5. Create Dashboards: Build visualization panels for key metrics
  6. Define Alerts: Create alerting rules with appropriate thresholds
  7. Test Monitoring: Verify metrics flow and alert triggering

Output

Prometheus + Grafana (Kubernetes):

# {baseDir}/monitoring/prometheus.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
          - role: pod
        relabel_configs:
          - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
            action: keep
            regex: true
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:latest
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention.time=30d'
        ports:
        - containerPort: 9090

Grafana Dashboard Configuration:

{
  "dashboard": {
    "title": "Application Metrics",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total[5m])"
          }
        ]
      }
    ]
  }
}

Error Handling

Metrics Not Appearing

  • Error: "No data points"
  • Solution: Verify scrape targets are accessible and returning metrics

High Cardinality

  • Error: "Too many time series"
  • Solution: Reduce label combinations or increase Prometheus resources

Alert Not Firing

  • Error: "Alert condition met but no notification"
  • Solution: Check Alertmanager configuration and notification channels

Dashboard Load Failure

  • Error: "Failed to load dashboard"
  • Solution: Verify Grafana datasource configuration and permissions

Resources

GitHub Repository

jeremylongshore/claude-code-plugins-plus
Path: plugins/devops/monitoring-stack-deployer/skills/monitoring-stack-deployer
aiautomationclaude-codedevopsmarketplacemcp

Related Skills

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

creating-opencode-plugins

Meta

This skill provides the structure and API specifications for creating OpenCode plugins that hook into 25+ event types like commands, files, and LSP operations. It offers implementation patterns for JavaScript/TypeScript modules that intercept and extend the AI assistant's lifecycle. Use it when you need to build event-driven plugins for monitoring, custom handling, or extending OpenCode's capabilities.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill

polymarket

Meta

This skill enables developers to build applications with the Polymarket prediction markets platform, including API integration for trading and market data. It also provides real-time data streaming via WebSocket to monitor live trades and market activity. Use it for implementing trading strategies or creating tools that process live market updates.

View skill