SKILL·4F3B93

configure-log-aggregation

Name: configure-log-aggregation
Author: pjt222

pjt222

Actualizado 1 month ago

9 vistas

Diseñoaidesign

Acerca de

Esta habilidad configura la agregación centralizada de registros utilizando Loki/Promtail o ELK, manejando el análisis de registros, la extracción de etiquetas y las políticas de retención. Está diseñada para consolidar registros de múltiples servicios en un sistema buscable y correlacionarlos con métricas y trazas. Úsela al reemplazar archivos de registro locales con almacenamiento centralizado o al solucionar incidentes que requieran análisis entre servicios.

Instalación rápida

Claude Code

Recomendado

Principal

npx skills add pjt222/agent-almanac -a claude-code

Comando PluginAlternativo

/plugin add https://github.com/pjt222/agent-almanac

Git CloneAlternativo

git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/configure-log-aggregation

Copia y pega este comando en Claude Code para instalar esta habilidad

Documentación

Configure Log Aggregation

Impl centralized log collection, parsing, querying w/ Loki/Promtail or ELK stack → operational visibility.

Use When

Consolidate logs from multi services/hosts → searchable system
Replace local log files w/ centralized, queryable log storage
Correlate logs w/ metrics + traces for full observability
Impl structured logging w/ label extraction from unstructured logs
Set retention policies for log data by storage + compliance needs
Troubleshoot prod incidents requiring log analysis across services

In

Required: Log sources (app logs, sys logs, container logs)
Required: Log format patterns (JSON, plaintext, syslog, etc.)
Optional: Label extraction rules for structured querying
Optional: Retention + compression policies
Optional: Existing log shipper config (Fluentd, Filebeat, Promtail)

Do

See Extended Examples for complete config files + templates.

Step 1: Choose Log Aggregation Stack

Select Loki (Prometheus-style) or ELK (Elasticsearch-based) by req's.

Loki advantages:

Lightweight, designed for K8s + cloud-native envs
Label-based indexing (like Prometheus) → low storage overhead
Native Grafana integration for unified dashboards
Horizontal scalability w/ object storage (S3, GCS)
Lower resource consumption vs. Elasticsearch

ELK advantages:

Full-text search across all log content (not just labels)
Rich query DSL + aggregations
Mature ecosystem w/ beats, logstash plugins
Better for compliance/audit logs requiring deep historical search

For this guide → focus on Loki + Promtail (rec'd for most modern setups).

Decision criteria:

Use Loki if:
- You want label-based queries similar to Prometheus
- Storage costs are a concern (Loki indexes only labels)
- You already use Grafana for metrics
- Kubernetes/container-native deployment

Use ELK if:
- You need full-text search across all log content
- You have complex log parsing and enrichment requirements
- You require advanced analytics and aggregations
- Legacy systems with existing Logstash pipelines

→ Clear choice made by req's, team downloads appropriate install artifacts.

If err:

Benchmark storage req's: Loki ~10x less than Elasticsearch for same logs
Eval query patterns: full-text search needs vs. label filtering
Consider operational overhead: ELK requires more tuning + resources

Step 2: Deploy Loki

Install + configure Loki w/ appropriate storage backend.

Docker Compose deployment (docker-compose.yml):

version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    restart: unless-stopped
    depends_on:
      - loki

volumes:
  loki-data:

Loki config (loki-config.yml):

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

# ... (see EXAMPLES.md for complete configuration)

For prod w/ S3 storage:

storage_config:
  aws:
    s3: s3://us-east-1/my-loki-bucket
    s3forcepathstyle: true
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: s3

→ Loki starts successfully, health check passes at http://localhost:3100/ready, logs stored per retention policy.

If err:

Check Loki logs: docker logs loki
Valid. storage dirs exist + writable
Test config syntax: docker run grafana/loki:2.9.0 -config.file=/etc/loki/local-config.yaml -verify-config
Ensure retention settings don't exceed disk capacity
S3: valid. IAM perms + bucket access

Step 3: Configure Promtail for Log Shipping

Set up Promtail to scrape logs + forward to Loki w/ label extraction.

Promtail config (promtail-config.yml):

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml
# ... (see EXAMPLES.md for complete configuration)

Key Promtail concepts:

Scrape configs: Define log sources + how to discover them
Pipeline stages: Transform + label logs before sending to Loki
Relabel configs: Dynamic labeling by metadata
Positions file: Tracks read offsets → avoid re-processing logs

→ Promtail scrapes configured log files, labels applied correct, logs visible in Loki via LogQL queries.

If err:

Check Promtail logs: docker logs promtail
Valid. file paths accessible: docker exec promtail ls /var/log
Test regex patterns independently w/ sample log lines
Monitor Promtail metrics: curl http://localhost:9080/metrics | grep promtail
Check positions file for progress: cat /tmp/positions.yaml

Step 4: Query Logs with LogQL

Learn LogQL syntax for filtering + aggregating logs.

Basic queries:

# All logs from a job
{job="app"}

# Logs with specific label values
{job="app", level="error"}

# Regex filter on log line content
{job="app"} |~ "authentication failed"

# Case-insensitive regex
{job="app"} |~ "(?i)error"

# Line filter (doesn't parse, just includes/excludes)
{job="app"} |= "user"  # Contains "user"
{job="app"} != "debug" # Doesn't contain "debug"

Parsing + filtering:

# JSON parsing
{job="app"} | json | level="error"

# Regex parsing with named groups
{job="app"} | regexp "user_id=(?P<user_id>\\d+)" | user_id="12345"

# Logfmt parsing (key=value format)
{job="app"} | logfmt | level="error", service="auth"

# Pattern parsing
{job="nginx"} | pattern `<ip> - <user> [<timestamp>] "<method> <path> <protocol>" <status> <size>` | status >= 500

Aggregations (metrics from logs):

# Count log lines per level
sum by (level) (count_over_time({job="app"}[5m]))

# Rate of error logs
rate({job="app", level="error"}[5m])

# Bytes processed per service
sum by (service) (bytes_over_time({job="app"}[1h]))

# Average request duration from logs
avg_over_time({job="app"} | json | unwrap duration [5m])

# Top 10 error messages
topk(10, sum by (message) (count_over_time({level="error"} [1h])))

Filter by extracted fields:

# Find specific trace in logs
{job="app"} | json | trace_id="abc123def456"

# HTTP 5xx errors from nginx
{job="nginx"} | pattern `<_> "<_> <_> <_>" <status> <_>` | status >= 500

# Failed authentication attempts
{job="app"} | json | message=~"authentication failed" | user_id != ""

Create Grafana explore queries or dashboard panels using these patterns.

→ Queries return expected log lines, filtering works correct, aggregations produce metrics from logs.

If err:

Use Grafana Explore → debug queries interactive
Check label names: curl http://localhost:3100/loki/api/v1/labels
Valid. label values: curl http://localhost:3100/loki/api/v1/label/{label_name}/values
Simplify query: start w/ basic label selector, add filters incrementally
Check time range: logs might not exist in selected window

Step 5: Integrate Logs with Metrics + Traces

Correlate logs w/ Prometheus metrics + distributed traces → unified observability.

Add trace IDs to logs (app instrumentation):

# Python with OpenTelemetry
import logging
from opentelemetry import trace

logger = logging.getLogger(__name__)

def handle_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id

    logger.info(
        "Processing request",
        extra={"trace_id": format(trace_id, "032x")}
    )

// Go with OpenTelemetry
import (
    "go.opentelemetry.io/otel/trace"
    "go.uber.org/zap"
)

func handleRequest(ctx context.Context) {
    span := trace.SpanFromContext(ctx)
    traceID := span.SpanContext().TraceID().String()

    logger.Info("Processing request",
        zap.String("trace_id", traceID),
    )
}

Configure Grafana data links from metrics to logs:

In Prometheus panel field config:

{
  "fieldConfig": {
    "defaults": {
      "links": [
        {
          "title": "View Logs",
          "url": "/explore?left={\"datasource\":\"Loki\",\"queries\":[{\"refId\":\"A\",\"expr\":\"{job=\\\"app\\\",instance=\\\"${__field.labels.instance}\\\"} |= `${__field.labels.trace_id}`\"}],\"range\":{\"from\":\"${__from}\",\"to\":\"${__to}\"}}",
          "targetBlank": false
        }
      ]
    }
  }
}

Configure Grafana data links from logs to traces:

In Loki datasource config:

datasources:
  - name: Loki
    type: loki
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceName: Tempo
          matcherRegex: "trace_id=(\\w+)"
          name: TraceID
          url: "$${__value.raw}"

Correlate logs in Grafana Explore:

Query metrics in Prometheus
Click on data point
Select "View Logs" from context menu
Loki query auto-pop'd w/ relevant labels + time range
Click trace ID in logs
Tempo trace view opens w/ full distributed trace

→ Clicking metrics opens related logs, trace IDs in logs link to trace viewer, single pane for metrics/logs/traces navigation.

If err:

Valid. trace ID format matches regex in derived fields
Check trace_id label extracted by Promtail pipeline
Ensure Tempo datasource config'd in Grafana
Test URL encoding for complex filter exprs
Valid. data link URLs in incognito/private browser window

Step 6: Set Up Log Retention + Compaction

Configure retention policies + compaction → manage storage costs.

Retention by stream (in Loki config):

limits_config:
  retention_period: 720h  # Global default: 30 days

  # Per-tenant retention (requires multi-tenancy enabled)
  per_tenant_override_config: /etc/loki/overrides.yaml

# overrides.yaml
overrides:
  production:
    retention_period: 2160h  # 90 days for production
  staging:
    retention_period: 360h   # 15 days for staging
  development:
    retention_period: 168h   # 7 days for dev

Retention by stream labels (requires compactor):

compactor:
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
# ... (see EXAMPLES.md for complete configuration)

Priority determines which rule applies when multi match (lower number = higher priority).

Compression settings:

chunk_store_config:
  chunk_cache_config:
    enable_fifocache: true
    fifocache:
      max_size_bytes: 1GB
      ttl: 24h
# ... (see EXAMPLES.md for complete configuration)

Monitor retention:

# Check chunk stats
curl http://localhost:3100/loki/api/v1/status/chunks | jq

# Check compactor metrics
curl http://localhost:3100/metrics | grep loki_compactor

# Verify deleted chunks
curl http://localhost:3100/metrics | grep loki_boltdb_shipper_retention_deleted

→ Old logs auto deleted per retention policy, storage usage stabilizes, compaction cuts index size.

If err:

Enable compactor in Loki config if retention not working
Check compactor logs: docker logs loki | grep compactor
Valid. retention_enabled: true + retention_deletes_enabled: true
Monitor disk usage: du -sh /loki/
S3: check bucket lifecycle policies don't conflict w/ Loki retention

Check

Traps

High cardinality labels: Unbounded label values (user IDs, req IDs) → index explosion. Use fixed labels (level, service, env) + put variables in log lines.
Missing log parsing: Raw logs w/o label extraction limits query capabilities. Always parse structured logs (JSON, logfmt) or use regex for unstructured.
Incorrect time parsing: Mismatched timestamp formats → logs out of order or rejected. Test timestamp parsing w/ sample logs.
Retention not working: Compactor must be enabled for retention to delete old data. Check retention_enabled: true + retention_deletes_enabled: true.
Ingestion rate limits: Default limits (10MB/s) may be too low for high-volume systems. Adjust ingestion_rate_mb + ingestion_burst_size_mb.
Query timeouts: Broad queries over long time ranges can timeout. Use more specific label selectors + shorter time windows.
Log duplication: Multi Promtail instances scraping same logs create dupes. Use unique labels or positions file coordination.

→

correlate-observability-signals - Unified debugging across metrics, logs, traces using trace IDs
build-grafana-dashboards - Visualize log-derived metrics + create log panels in dashboards
setup-prometheus-monitoring - Metrics provide context for when to query logs during incidents
instrument-distributed-tracing - Add trace IDs to logs for correlation w/ distributed traces

Repositorio GitHub

pjt222/agent-almanac

Ruta: i18n/caveman-ultra/skills/configure-log-aggregation

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the configure-log-aggregation skill?

configure-log-aggregation is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform configure-log-aggregation-related tasks without extra prompting.

How do I install configure-log-aggregation?

Use the install commands on this page: add configure-log-aggregation to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does configure-log-aggregation belong to?

configure-log-aggregation is in the Design category, tagged ai and design.

Is configure-log-aggregation free to use?

Yes. configure-log-aggregation is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Habilidades relacionadas

executing-plans

Diseño

Utilice la habilidad executing-plans cuando tenga un plan de implementación completo para ejecutar en lotes controlados con puntos de revisión. Esta habilidad carga y revisa críticamente el plan, luego ejecuta tareas en pequeños lotes (por defecto 3 tareas) mientras reporta el progreso entre cada lote para la revisión del arquitecto. Esto asegura una implementación sistemática con puntos de control de calidad integrados.

Ver habilidad

requesting-code-review

Diseño

Esta habilidad despacha un subagente revisor de código para analizar los cambios en el código frente a los requisitos antes de proceder. Debe usarse después de completar tareas, implementar funciones principales o antes de fusionar con la rama principal. La revisión ayuda a detectar problemas de forma temprana al comparar la implementación actual con el plan original.

Ver habilidad

connect-mcp-server

Diseño

Esta habilidad proporciona una guía integral para que los desarrolladores conecten servidores MCP a Claude Code mediante transportes HTTP, stdio o SSE. Cubre la instalación, configuración, autenticación y seguridad para integrar servicios externos como GitHub, Notion y APIs personalizadas. Úsala al configurar integraciones MCP, al configurar herramientas externas o al trabajar con el Protocolo de Contexto del Modelo de Claude.

Ver habilidad

web-cli-teleport

Diseño

Esta habilidad ayuda a los desarrolladores a elegir entre las interfaces web y CLI de Claude Code mediante el análisis de tareas, y luego permite la teletransportación fluida de sesiones entre estos entornos. Optimiza el flujo de trabajo gestionando el estado y el contexto de la sesión al cambiar entre web, CLI o móvil. Úsala para proyectos complejos que requieren diferentes herramientas en varias etapas.

Ver habilidad