SKILL·76BB94

configure-log-aggregation

Name: configure-log-aggregation
Author: pjt222

pjt222

업데이트됨 1 month ago

9 조회

디자인aidesign

정보

이 스킬은 Loki/Promtail 또는 ELK를 사용하여 중앙 집중식 로그 수집을 설정하며, 로그 파싱, 레이블 추출 및 보존 정책을 처리합니다. 여러 서비스의 로그를 검색 가능한 시스템으로 통합하고 메트릭 및 트레이스와 연관시키도록 설계되었습니다. 로컬 로그 파일을 중앙 집중식 저장소로 교체하거나 서비스 간 분석이 필요한 장애 해결 시 사용하세요.

빠른 설치

Claude Code

문서

Configure Log Aggregation

Implement centralized log collection, parsing, querying with Loki/Promtail or ELK stack for operational visibility.

When Use

Consolidating logs from multiple services or hosts into searchable system
Replacing local log files with centralized, queryable log storage
Correlating logs with metrics and traces for full observability
Implementing structured logging with label extraction from unstructured logs
Setting retention policies for log data based on storage and compliance needs
Troubleshooting production incidents requiring log analysis across services

Inputs

Required: Log sources (application logs, system logs, container logs)
Required: Log format patterns (JSON, plaintext, syslog, etc.)
Optional: Label extraction rules for structured querying
Optional: Retention and compression policies
Optional: Existing log shipper configuration (Fluentd, Filebeat, Promtail)

Steps

See Extended Examples for complete configuration files and templates.

Step 1: Choose Log Aggregation Stack

Select between Loki (Prometheus-style) or ELK (Elasticsearch-based) based on requirements.

Loki advantages:

Lightweight, designed for Kubernetes and cloud-native environments
Label-based indexing (like Prometheus) for low storage overhead
Native integration with Grafana for unified dashboards
Horizontal scalability with object storage (S3, GCS)
Lower resource consumption compared to Elasticsearch

ELK advantages:

Full-text search across all log content (not just labels)
Rich query DSL and aggregations
Mature ecosystem with beats, logstash plugins
Better for compliance/audit logs requiring deep historical search

For this guide, focus on Loki + Promtail (recommended for most modern setups).

Decision criteria:

Use Loki if:
- You want label-based queries similar to Prometheus
- Storage costs are a concern (Loki indexes only labels)
- You already use Grafana for metrics
- Kubernetes/container-native deployment

Use ELK if:
- You need full-text search across all log content
- You have complex log parsing and enrichment requirements
- You require advanced analytics and aggregations
- Legacy systems with existing Logstash pipelines

Got: Clear choice made based on requirements, team downloads appropriate installation artifacts.

If fail:

Benchmark storage requirements: Loki ~10x less than Elasticsearch for same logs
Evaluate query patterns: full-text search needs vs label filtering
Consider operational overhead: ELK requires more tuning and resources

Step 2: Deploy Loki

Install, configure Loki with appropriate storage backend.

Docker Compose deployment (docker-compose.yml):

version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    restart: unless-stopped
    depends_on:
      - loki

volumes:
  loki-data:

Loki configuration (loki-config.yml):

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

# ... (see EXAMPLES.md for complete configuration)

For production with S3 storage:

storage_config:
  aws:
    s3: s3://us-east-1/my-loki-bucket
    s3forcepathstyle: true
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: s3

Got: Loki starts successfully, health check passes at http://localhost:3100/ready, logs stored according to retention policy.

If fail:

Check Loki logs: docker logs loki
Verify storage directories exist and are writable
Test config syntax: docker run grafana/loki:2.9.0 -config.file=/etc/loki/local-config.yaml -verify-config
Ensure retention settings don't exceed disk capacity
For S3: verify IAM permissions and bucket access

Step 3: Configure Promtail for Log Shipping

Set up Promtail to scrape logs, forward to Loki with label extraction.

Promtail configuration (promtail-config.yml):

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml
# ... (see EXAMPLES.md for complete configuration)

Key Promtail concepts:

Scrape configs: Define log sources and how to discover them
Pipeline stages: Transform, label logs before sending to Loki
Relabel configs: Dynamic labeling based on metadata
Positions file: Tracks read offsets to avoid re-processing logs

Got: Promtail scrapes configured log files, labels applied correctly, logs visible in Loki via LogQL queries.

If fail:

Check Promtail logs: docker logs promtail
Verify file paths are accessible: docker exec promtail ls /var/log
Test regex patterns independently with sample log lines
Monitor Promtail metrics: curl http://localhost:9080/metrics | grep promtail
Check positions file for progress: cat /tmp/positions.yaml

Step 4: Query Logs with LogQL

Learn LogQL syntax for filtering, aggregating logs.

Basic queries:

# All logs from a job
{job="app"}

# Logs with specific label values
{job="app", level="error"}

# Regex filter on log line content
{job="app"} |~ "authentication failed"

# Case-insensitive regex
{job="app"} |~ "(?i)error"

# Line filter (doesn't parse, just includes/excludes)
{job="app"} |= "user"  # Contains "user"
{job="app"} != "debug" # Doesn't contain "debug"

Parsing and filtering:

# JSON parsing
{job="app"} | json | level="error"

# Regex parsing with named groups
{job="app"} | regexp "user_id=(?P<user_id>\\d+)" | user_id="12345"

# Logfmt parsing (key=value format)
{job="app"} | logfmt | level="error", service="auth"

# Pattern parsing
{job="nginx"} | pattern `<ip> - <user> [<timestamp>] "<method> <path> <protocol>" <status> <size>` | status >= 500

Aggregations (metrics from logs):

# Count log lines per level
sum by (level) (count_over_time({job="app"}[5m]))

# Rate of error logs
rate({job="app", level="error"}[5m])

# Bytes processed per service
sum by (service) (bytes_over_time({job="app"}[1h]))

# Average request duration from logs
avg_over_time({job="app"} | json | unwrap duration [5m])

# Top 10 error messages
topk(10, sum by (message) (count_over_time({level="error"} [1h])))

Filtering by extracted fields:

# Find specific trace in logs
{job="app"} | json | trace_id="abc123def456"

# HTTP 5xx errors from nginx
{job="nginx"} | pattern `<_> "<_> <_> <_>" <status> <_>` | status >= 500

# Failed authentication attempts
{job="app"} | json | message=~"authentication failed" | user_id != ""

Create Grafana explore queries or dashboard panels using these patterns.

Got: Queries return expected log lines, filtering works correctly, aggregations produce metrics from logs.

If fail:

Use Grafana Explore to debug queries interactively
Check label names: curl http://localhost:3100/loki/api/v1/labels
Verify label values: curl http://localhost:3100/loki/api/v1/label/{label_name}/values
Simplify query: start with basic label selector, add filters incrementally
Check time range: logs might not exist in selected window

Step 5: Integrate Logs with Metrics and Traces

Correlate logs with Prometheus metrics, distributed traces for unified observability.

Add trace IDs to logs (application instrumentation):

# Python with OpenTelemetry
import logging
from opentelemetry import trace

logger = logging.getLogger(__name__)

def handle_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id

    logger.info(
        "Processing request",
        extra={"trace_id": format(trace_id, "032x")}
    )

// Go with OpenTelemetry
import (
    "go.opentelemetry.io/otel/trace"
    "go.uber.org/zap"
)

func handleRequest(ctx context.Context) {
    span := trace.SpanFromContext(ctx)
    traceID := span.SpanContext().TraceID().String()

    logger.Info("Processing request",
        zap.String("trace_id", traceID),
    )
}

Configure Grafana data links from metrics to logs:

In Prometheus panel field config:

{
  "fieldConfig": {
    "defaults": {
      "links": [
        {
          "title": "View Logs",
          "url": "/explore?left={\"datasource\":\"Loki\",\"queries\":[{\"refId\":\"A\",\"expr\":\"{job=\\\"app\\\",instance=\\\"${__field.labels.instance}\\\"} |= `${__field.labels.trace_id}`\"}],\"range\":{\"from\":\"${__from}\",\"to\":\"${__to}\"}}",
          "targetBlank": false
        }
      ]
    }
  }
}

Configure Grafana data links from logs to traces:

In Loki datasource config:

datasources:
  - name: Loki
    type: loki
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceName: Tempo
          matcherRegex: "trace_id=(\\w+)"
          name: TraceID
          url: "$${__value.raw}"

Correlate logs in Grafana Explore:

Query metrics in Prometheus
Click on data point
Select "View Logs" from context menu
Loki query auto-populated with relevant labels and time range
Click trace ID in logs
Tempo trace view opens with full distributed trace

Got: Clicking metrics opens related logs, trace IDs in logs link to trace viewer, single pane for metrics/logs/traces navigation.

If fail:

Verify trace ID format matches regex in derived fields
Check that trace_id label extracted by Promtail pipeline
Ensure Tempo datasource configured in Grafana
Test URL encoding for complex filter expressions
Validate data link URLs in incognito/private browser window

Step 6: Set Up Log Retention and Compaction

Configure retention policies, compaction to manage storage costs.

Retention by stream (in Loki config):

limits_config:
  retention_period: 720h  # Global default: 30 days

  # Per-tenant retention (requires multi-tenancy enabled)
  per_tenant_override_config: /etc/loki/overrides.yaml

# overrides.yaml
overrides:
  production:
    retention_period: 2160h  # 90 days for production
  staging:
    retention_period: 360h   # 15 days for staging
  development:
    retention_period: 168h   # 7 days for dev

Retention by stream labels (requires compactor):

compactor:
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
# ... (see EXAMPLES.md for complete configuration)

Priority determines which rule applies when multiple match (lower number = higher priority).

Compression settings:

chunk_store_config:
  chunk_cache_config:
    enable_fifocache: true
    fifocache:
      max_size_bytes: 1GB
      ttl: 24h
# ... (see EXAMPLES.md for complete configuration)

Monitor retention:

# Check chunk stats
curl http://localhost:3100/loki/api/v1/status/chunks | jq

# Check compactor metrics
curl http://localhost:3100/metrics | grep loki_compactor

# Verify deleted chunks
curl http://localhost:3100/metrics | grep loki_boltdb_shipper_retention_deleted

Got: Old logs automatically deleted per retention policy, storage usage stabilizes, compaction reduces index size.

If fail:

Enable compactor in Loki config if retention not working
Check compactor logs: docker logs loki | grep compactor
Verify retention_enabled: true and retention_deletes_enabled: true
Monitor disk usage: du -sh /loki/
For S3: check bucket lifecycle policies don't conflict with Loki retention

Checks

Pitfalls

High cardinality labels: Using unbounded label values (user IDs, request IDs) causes index explosion. Use fixed labels (level, service, env), put variables in log lines.
Missing log parsing: Sending raw logs without label extraction limits query capabilities. Always parse structured logs (JSON, logfmt) or use regex for unstructured.
Incorrect time parsing: Mismatched timestamp formats cause logs to be out of order or rejected. Test timestamp parsing with sample logs.
Retention not working: Compactor must be enabled for retention to delete old data. Check retention_enabled: true and retention_deletes_enabled: true.
Ingestion rate limits: Default limits (10MB/s) may be too low for high-volume systems. Adjust ingestion_rate_mb and ingestion_burst_size_mb.
Query timeouts: Broad queries over long time ranges can timeout. Use more specific label selectors and shorter time windows.
Log duplication: Multiple Promtail instances scraping same logs create duplicates. Use unique labels or positions file coordination.

GitHub 저장소

pjt222/agent-almanac

경로: i18n/caveman/skills/configure-log-aggregation

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the configure-log-aggregation skill?

configure-log-aggregation is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform configure-log-aggregation-related tasks without extra prompting.

How do I install configure-log-aggregation?

Use the install commands on this page: add configure-log-aggregation to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does configure-log-aggregation belong to?

configure-log-aggregation is in the Design category, tagged ai and design.

Is configure-log-aggregation free to use?

Yes. configure-log-aggregation is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

executing-plans

디자인

executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.

스킬 보기

requesting-code-review

디자인

이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.

스킬 보기

connect-mcp-server

디자인

이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.

스킬 보기

web-cli-teleport

디자인

이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.

스킬 보기

configure-log-aggregation

정보

빠른 설치

Claude Code

문서

Configure Log Aggregation

When Use

Inputs

Steps

Step 1: Choose Log Aggregation Stack

Step 2: Deploy Loki

Step 3: Configure Promtail for Log Shipping

Step 4: Query Logs with LogQL

Step 5: Integrate Logs with Metrics and Traces

Step 6: Set Up Log Retention and Compaction

Checks

Pitfalls

See Also

GitHub 저장소

Frequently asked questions

What is the configure-log-aggregation skill?

How do I install configure-log-aggregation?

What category does configure-log-aggregation belong to?

Is configure-log-aggregation free to use?

연관 스킬