SKILL·6DDC4C

configure-log-aggregation

Name: configure-log-aggregation
Author: pjt222

pjt222

업데이트됨 1 month ago

9 조회

기타ai

정보

이 스킬은 여러 서비스에 걸친 로그 수집, 파싱 및 쿼리를 위해 Loki/Promtail 또는 ELK 스택을 사용한 중앙 집중식 로그 집계를 설정합니다. 라벨 추출, 보존 정책, 그리고 메트릭 및 트레이스와의 상관관계를 통한 구조화된 로깅을 가능하게 합니다. 로컬 로그 파일을 검색 가능한 중앙 집중식 저장소로 대체해야 하거나, 서비스 간 로그 분석이 필요한 프로덕션 문제를 해결할 때 사용하세요.

빠른 설치

Claude Code

문서

name: configure-log-aggregation description: > 使用 Loki 和 Promtail（或 ELK 栈）设置集中式日志聚合，包括日志解析、标签提取、保留策略及与指标的集成关联。适用于将多服务日志整合到可搜索系统、用集中式可查询存储替代本地日志文件、将日志与指标和追踪关联、实现带标签提取的结构化日志记录，或排查需要跨服务日志分析的生产事故。 locale: zh-CN source_locale: en source_commit: 6f65f316 translator: claude-opus-4-6 translation_date: 2026-03-16 license: MIT allowed-tools: Read Write Edit Bash Grep Glob metadata: author: Philipp Thoss version: "1.0" domain: observability complexity: intermediate language: multi tags: loki, promtail, logging, elk, log-aggregation

Configure Log Aggregation

使用 Loki/Promtail 或 ELK 栈实现集中式日志收集、解析和查询，提升运营可见性。

适用场景

将多服务或多主机的日志整合到可搜索系统中
用集中式可查询日志存储替代本地日志文件
将日志与指标和追踪关联以实现全面可观测性
通过从非结构化日志中提取标签实现结构化日志记录
根据存储和合规需求设置日志数据的保留策略
排查需要跨服务日志分析的生产事故

输入

必填：日志来源（应用日志、系统日志、容器日志）
必填：日志格式模式（JSON、纯文本、syslog 等）
可选：用于结构化查询的标签提取规则
可选：保留和压缩策略
可选：现有日志传输配置（Fluentd、Filebeat、Promtail）

步骤

完整配置文件和模板请参阅 Extended Examples。

第 1 步：选择日志聚合栈

根据需求在 Loki（Prometheus 风格）和 ELK（基于 Elasticsearch）之间做出选择。

Loki 优势：

轻量级，专为 Kubernetes 和云原生环境设计
基于标签的索引（类似 Prometheus），存储开销低
与 Grafana 原生集成，实现统一仪表板
通过对象存储（S3、GCS）水平扩展
相比 Elasticsearch 资源消耗更低

ELK 优势：

跨所有日志内容的全文搜索（不仅限于标签）
丰富的查询 DSL 和聚合功能
成熟生态，含 beats 和 logstash 插件
更适合需要深度历史搜索的合规/审计日志

本指南重点介绍 Loki + Promtail（适合大多数现代设置的推荐方案）。

决策标准：

Use Loki if:
- You want label-based queries similar to Prometheus
- Storage costs are a concern (Loki indexes only labels)
- You already use Grafana for metrics
- Kubernetes/container-native deployment

Use ELK if:
- You need full-text search across all log content
- You have complex log parsing and enrichment requirements
- You require advanced analytics and aggregations
- Legacy systems with existing Logstash pipelines

预期结果： 根据需求做出明确选择，团队下载相应的安装包。

失败处理：

对比存储需求基准：相同日志量下 Loki 约为 Elasticsearch 的 1/10
评估查询模式：全文搜索需求 vs 标签过滤
考虑运维开销：ELK 需要更多调优和资源

第 2 步：部署 Loki

安装并配置带合适存储后端的 Loki。

Docker Compose 部署（docker-compose.yml）：

version: '3.8'

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yml:/etc/loki/local-config.yaml
      - loki-data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - ./promtail-config.yml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
    command: -config.file=/etc/promtail/config.yml
    restart: unless-stopped
    depends_on:
      - loki

volumes:
  loki-data:

Loki 配置（loki-config.yml）：

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

# ... (see EXAMPLES.md for complete configuration)

生产环境使用 S3 存储：

storage_config:
  aws:
    s3: s3://us-east-1/my-loki-bucket
    s3forcepathstyle: true
  boltdb_shipper:
    active_index_directory: /loki/index
    cache_location: /loki/cache
    shared_store: s3

预期结果： Loki 成功启动，http://localhost:3100/ready 健康检查通过，日志按保留策略存储。

失败处理：

检查 Loki 日志：docker logs loki
验证存储目录存在且可写
测试配置语法：docker run grafana/loki:2.9.0 -config.file=/etc/loki/local-config.yaml -verify-config
确保保留设置不超出磁盘容量
对于 S3：验证 IAM 权限和存储桶访问

第 3 步：配置 Promtail 进行日志传输

设置 Promtail 抓取日志并将带标签提取的日志转发到 Loki。

Promtail 配置（promtail-config.yml）：

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml
# ... (see EXAMPLES.md for complete configuration)

Promtail 关键概念：

抓取配置：定义日志来源及其发现方式
管道阶段：在发送到 Loki 前转换和标记日志
重新标记配置：基于元数据的动态标记
位置文件：追踪读取偏移量，避免重复处理日志

预期结果： Promtail 抓取配置的日志文件，标签正确应用，日志可通过 LogQL 查询在 Loki 中看到。

失败处理：

检查 Promtail 日志：docker logs promtail
验证文件路径可访问：docker exec promtail ls /var/log
使用示例日志行独立测试正则表达式
监控 Promtail 指标：curl http://localhost:9080/metrics | grep promtail
检查位置文件进度：cat /tmp/positions.yaml

第 4 步：使用 LogQL 查询日志

学习 LogQL 语法用于过滤和聚合日志。

基础查询：

# All logs from a job
{job="app"}

# Logs with specific label values
{job="app", level="error"}

# Regex filter on log line content
{job="app"} |~ "authentication failed"

# Case-insensitive regex
{job="app"} |~ "(?i)error"

# Line filter (doesn't parse, just includes/excludes)
{job="app"} |= "user"  # Contains "user"
{job="app"} != "debug" # Doesn't contain "debug"

解析和过滤：

# JSON parsing
{job="app"} | json | level="error"

# Regex parsing with named groups
{job="app"} | regexp "user_id=(?P<user_id>\\d+)" | user_id="12345"

# Logfmt parsing (key=value format)
{job="app"} | logfmt | level="error", service="auth"

# Pattern parsing
{job="nginx"} | pattern `<ip> - <user> [<timestamp>] "<method> <path> <protocol>" <status> <size>` | status >= 500

聚合（从日志生成指标）：

# Count log lines per level
sum by (level) (count_over_time({job="app"}[5m]))

# Rate of error logs
rate({job="app", level="error"}[5m])

# Bytes processed per service
sum by (service) (bytes_over_time({job="app"}[1h]))

# Average request duration from logs
avg_over_time({job="app"} | json | unwrap duration [5m])

# Top 10 error messages
topk(10, sum by (message) (count_over_time({level="error"} [1h])))

按提取字段过滤：

# Find specific trace in logs
{job="app"} | json | trace_id="abc123def456"

# HTTP 5xx errors from nginx
{job="nginx"} | pattern `<_> "<_> <_> <_>" <status> <_>` | status >= 500

# Failed authentication attempts
{job="app"} | json | message=~"authentication failed" | user_id != ""

使用这些模式创建 Grafana Explore 查询或仪表板面板。

预期结果： 查询返回预期日志行，过滤正确有效，聚合从日志生成指标。

失败处理：

使用 Grafana Explore 交互式调试查询
检查标签名称：curl http://localhost:3100/loki/api/v1/labels
验证标签值：curl http://localhost:3100/loki/api/v1/label/{label_name}/values
简化查询：从基础标签选择器开始，逐步添加过滤
检查时间范围：日志可能不在选定窗口内

第 5 步：将日志与指标和追踪集成

将日志与 Prometheus 指标和分布式追踪关联，实现统一可观测性。

向日志添加追踪 ID（应用程序插桩）：

# Python with OpenTelemetry
import logging
from opentelemetry import trace

logger = logging.getLogger(__name__)

def handle_request():
    span = trace.get_current_span()
    trace_id = span.get_span_context().trace_id

    logger.info(
        "Processing request",
        extra={"trace_id": format(trace_id, "032x")}
    )

// Go with OpenTelemetry
import (
    "go.opentelemetry.io/otel/trace"
    "go.uber.org/zap"
)

func handleRequest(ctx context.Context) {
    span := trace.SpanFromContext(ctx)
    traceID := span.SpanContext().TraceID().String()

    logger.Info("Processing request",
        zap.String("trace_id", traceID),
    )
}

配置 Grafana 数据链接从指标到日志：

在 Prometheus 面板字段配置中：

{
  "fieldConfig": {
    "defaults": {
      "links": [
        {
          "title": "View Logs",
          "url": "/explore?left={\"datasource\":\"Loki\",\"queries\":[{\"refId\":\"A\",\"expr\":\"{job=\\\"app\\\",instance=\\\"${__field.labels.instance}\\\"} |= `${__field.labels.trace_id}`\"}],\"range\":{\"from\":\"${__from}\",\"to\":\"${__to}\"}}",
          "targetBlank": false
        }
      ]
    }
  }
}

配置 Grafana 数据链接从日志到追踪：

在 Loki 数据源配置中：

datasources:
  - name: Loki
    type: loki
    url: http://loki:3100
    jsonData:
      derivedFields:
        - datasourceName: Tempo
          matcherRegex: "trace_id=(\\w+)"
          name: TraceID
          url: "$${__value.raw}"

在 Grafana Explore 中关联日志：

在 Prometheus 中查询指标
点击数据点
从上下文菜单选择"View Logs"
Loki 查询自动填充相关标签和时间范围
点击日志中的追踪 ID
Tempo 追踪视图打开并显示完整分布式追踪

预期结果： 点击指标打开相关日志，日志中的追踪 ID 链接到追踪查看器，实现指标/日志/追踪的单一导航面板。

失败处理：

验证追踪 ID 格式与派生字段中的正则表达式匹配
检查 Promtail 管道提取的 trace_id 标签
确保 Grafana 中配置了 Tempo 数据源
测试复杂过滤表达式的 URL 编码
在隐私/无痕浏览器窗口中验证数据链接 URL

第 6 步：设置日志保留和压缩

配置保留策略和压缩以管理存储成本。

按流保留（在 Loki 配置中）：

limits_config:
  retention_period: 720h  # Global default: 30 days

  # Per-tenant retention (requires multi-tenancy enabled)
  per_tenant_override_config: /etc/loki/overrides.yaml

# overrides.yaml
overrides:
  production:
    retention_period: 2160h  # 90 days for production
  staging:
    retention_period: 360h   # 15 days for staging
  development:
    retention_period: 168h   # 7 days for dev

按流标签保留（需要压缩器）：

compactor:
  working_directory: /loki/compactor
  shared_store: filesystem
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
# ... (see EXAMPLES.md for complete configuration)

优先级决定多个规则匹配时适用哪条（数字越小优先级越高）。

压缩设置：

chunk_store_config:
  chunk_cache_config:
    enable_fifocache: true
    fifocache:
      max_size_bytes: 1GB
      ttl: 24h
# ... (see EXAMPLES.md for complete configuration)

监控保留情况：

# Check chunk stats
curl http://localhost:3100/loki/api/v1/status/chunks | jq

# Check compactor metrics
curl http://localhost:3100/metrics | grep loki_compactor

# Verify deleted chunks
curl http://localhost:3100/metrics | grep loki_boltdb_shipper_retention_deleted

预期结果： 旧日志按保留策略自动删除，存储使用量趋于稳定，压缩减小索引大小。

失败处理：

如果保留不生效，在 Loki 配置中启用压缩器
检查压缩器日志：docker logs loki | grep compactor
验证 retention_enabled: true 和 retention_deletes_enabled: true
监控磁盘使用：du -sh /loki/
对于 S3：检查存储桶生命周期策略不与 Loki 保留冲突

验证清单

常见问题

高基数标签：使用无限制的标签值（用户 ID、请求 ID）会导致索引爆炸。使用固定标签（level、service、env），将变量放在日志行中。
缺少日志解析：不带标签提取的原始日志发送限制了查询能力。始终解析结构化日志（JSON、logfmt）或对非结构化日志使用正则表达式。
时间戳解析错误：时间戳格式不匹配导致日志乱序或被拒绝。使用示例日志测试时间戳解析。
保留不生效：压缩器必须启用才能删除旧数据。检查 retention_enabled: true 和 retention_deletes_enabled: true。
摄入速率限制：默认限制（10MB/s）对高容量系统可能过低。调整 ingestion_rate_mb 和 ingestion_burst_size_mb。
查询超时：跨长时间范围的宽泛查询可能超时。使用更具体的标签选择器和更短的时间窗口。
日志重复：多个 Promtail 实例抓取相同日志会产生重复。使用唯一标签或位置文件协调。

GitHub 저장소

pjt222/agent-almanac

경로: i18n/zh-CN/skills/configure-log-aggregation

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the configure-log-aggregation skill?

configure-log-aggregation is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform configure-log-aggregation-related tasks without extra prompting.

How do I install configure-log-aggregation?

Use the install commands on this page: add configure-log-aggregation to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does configure-log-aggregation belong to?

configure-log-aggregation is in the Other category, tagged ai.

Is configure-log-aggregation free to use?

Yes. configure-log-aggregation is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

llamaguard

기타

LlamaGuard는 폭력 및 혐오 발언 등 6가지 안전 범주에서 LLM 입력과 출력을 조정하기 위한 Meta의 70-80억 파라미터 모델입니다. 94-95% 정확도를 제공하며 vLLM, Hugging Face 또는 Amazon SageMaker를 사용해 배포할 수 있습니다. 이 기술을 사용하여 AI 애플리케이션에 콘텐츠 필터링 및 안전 가드레일을 손쉽게 통합하세요.

스킬 보기

cost-optimization

기타

이 Claude Skill은 리소스 적정화, 태깅 전략, 지출 분석을 통해 개발자들이 클라우드 비용을 최적화할 수 있도록 지원합니다. AWS, Azure, GCP에서 클라우드 비용을 절감하고 비용 거버넌스를 구현하기 위한 프레임워크를 제공합니다. 인프라 비용을 분석하거나, 리소스를 적정화하거나, 예산 제약을 충족해야 할 때 사용하세요.

스킬 보기

sports-betting-analyzer

기타

이 Claude Skill은 스프레드, 오버/언더, 프로프 베트를 포함한 스포츠 베팅 시장을 분석합니다. 역사적 추이와 상황별 통계를 검토하여 가치 베트를 발견하고, 교육적 목적으로 실행 가능한 권장 사항이 담긴 구조화된 마크다운 결과를 제공합니다. 개발자는 이 기능을 스포츠 베팅 분석 도구에 활용할 수 있으며, 단순히 엔터테인먼트/교육 목적으로만 설계되었음을 유의해야 합니다.

스킬 보기

quantizing-models-bitsandbytes

기타

이 스킬은 bitsandbytes를 사용하여 LLM을 8비트 또는 4비트 정밀도로 양자화하며, 최소한의 정확도 손실로 50-75%의 메모리 감소를 달성합니다. 제한된 GPU 메모리에서 더 큰 모델을 실행하거나 추론을 가속화하는 데 이상적이며, INT8, NF4, FP4와 같은 형식을 지원합니다. 이 스킬은 HuggingFace Transformers와 통합되어 QLoRA 학습 및 8비트 옵티마이저를 가능하게 합니다.

스킬 보기

configure-log-aggregation

정보

빠른 설치

Claude Code

문서

Configure Log Aggregation

适用场景

输入

步骤

第 1 步：选择日志聚合栈

第 2 步：部署 Loki

第 3 步：配置 Promtail 进行日志传输

第 4 步：使用 LogQL 查询日志

第 5 步：将日志与指标和追踪集成

第 6 步：设置日志保留和压缩

验证清单

常见问题

相关技能

GitHub 저장소

Frequently asked questions

What is the configure-log-aggregation skill?

How do I install configure-log-aggregation?

What category does configure-log-aggregation belong to?

Is configure-log-aggregation free to use?

연관 스킬