SKILL·35CFDF

instrument-distributed-tracing

Name: instrument-distributed-tracing
Author: pjt222

pjt222

업데이트됨 1 month ago

9 조회

기타general

정보

이 스킬은 개발자들이 분산 추적을 위해 OpenTelemetry로 애플리케이션을 계측하도록 도와주며, 지연 시간 문제를 디버깅하고 마이크로서비스 간 요청 흐름을 이해할 수 있게 합니다. 자동 및 수동 계측, 컨텍스트 전파, Jaeger나 Tempo 같은 백엔드와의 통합을 모두 지원합니다. 종단 간 지연 시간 측정, 추적과 로그/메트릭의 연관 분석, 레거시 추적 시스템에서의 마이그레이션에 활용하세요.

빠른 설치

Claude Code

문서

name: instrument-distributed-tracing description: > 自動・手動インストルメンテーション、コンテキスト伝播、サンプリング戦略、 JaegerまたはTempoとの統合を含む、分散トレーシングのためにOpenTelemetryでアプリケーションをインストルメントする。分散システムのレイテンシ問題のデバッグ、マイクロサービス間のリクエストフローの把握、根本原因分析のためのトレースとログ・メトリクスの相関、エンドツーエンドのレイテンシ測定、またはレガシーのトレーシングシステムからOpenTelemetryへの移行時に使用する。 locale: ja source_locale: en source_commit: 6f65f316 translator: claude-opus-4-6 translation_date: 2026-03-16 license: MIT allowed-tools: Read Write Edit Bash Grep Glob metadata: author: Philipp Thoss version: "1.0" domain: observability complexity: advanced language: multi tags: opentelemetry, tracing, jaeger, tempo, instrumentation

分散トレーシングのインストルメント

OpenTelemetryの分散トレーシングを実装して、マイクロサービス間のリクエストを追跡し、パフォーマンスのボトルネックを特定する。

使用タイミング

複数のサービスを持つ分散システムのレイテンシ問題をデバッグする場合
マイクロサービス間のリクエストフローと依存関係を把握する場合
トランザクション内の遅いデータベースクエリや外部API呼び出しを特定する場合
根本原因分析のためにトレースをログとメトリクスと相関させる場合
ユーザーリクエストからレスポンスまでのエンドツーエンドのレイテンシを測定する場合
レガシーのトレーシングシステム（Zipkin、Jaeger）からOpenTelemetryに移行する場合
詳細なレイテンシパーセンタイル追跡によるSLOコンプライアンスを確立する場合

入力

必須: インストルメントするサービスのリスト（言語とフレームワーク）
必須: トレーシングバックエンドの選択（Jaeger、Tempo、Zipkin、またはベンダーSaaS）
任意: 既存のインストルメンテーションライブラリ（OpenTracing、Zipkin）
任意: サンプリング戦略の要件（パーセンテージ、レート制限）
任意: ビジネス固有のメタデータ用のカスタムスパン属性

手順

完全な設定ファイルとテンプレートは拡張例を参照。

ステップ1: トレーシングバックエンドのセットアップ

JaegerまたはGrafana Tempoをデプロイしてトレースを受信・保存する。

オプションA: Jaegerオールインワン（開発/テスト）：

# docker-compose.yml
version: '3.8'
services:
  jaeger:
    image: jaegertracing/all-in-one:1.51
    ports:
      - "5775:5775/udp"   # Zipkin compact thrift
      - "6831:6831/udp"   # Jaeger compact thrift
      - "6832:6832/udp"   # Jaeger binary thrift
      - "5778:5778"       # Serve configs
      - "16686:16686"     # Jaeger UI
      - "14268:14268"     # Jaeger HTTP thrift
      - "14250:14250"     # Jaeger GRPC
      - "9411:9411"       # Zipkin compatible endpoint
    environment:
      - COLLECTOR_ZIPKIN_HOST_PORT=:9411
      - COLLECTOR_OTLP_ENABLED=true
    restart: unless-stopped

オプションB: Grafana Tempo（本番、スケーラブル）：

# docker-compose.yml
version: '3.8'
services:
  tempo:
    image: grafana/tempo:2.3.0
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
      - tempo-data:/tmp/tempo
    ports:
      - "3200:3200"   # Tempo HTTP
      - "4317:4317"   # OTLP gRPC
      - "4318:4318"   # OTLP HTTP
      - "9411:9411"   # Zipkin
    restart: unless-stopped

volumes:
  tempo-data:

Tempo設定（tempo.yaml）：

server:
  http_listen_port: 3200

distributor:
  receivers:
    jaeger:
# ... (完全な設定はEXAMPLES.mdを参照)

S3ストレージを使用した本番環境：

storage:
  trace:
    backend: s3
    s3:
      bucket: tempo-traces
      endpoint: s3.amazonaws.com
      region: us-east-1
    wal:
      path: /tmp/tempo/wal
    pool:
      max_workers: 100
      queue_depth: 10000

期待結果： トレーシングバックエンドがアクセス可能で、OTLPを介してトレースを受信する準備ができており、JaegerのUIまたはGrafanaが最初は「トレースなし」を表示する。

失敗時：

ポートが使用中でないことを確認する: netstat -tulpn | grep -E '(4317|16686|3200)'
コンテナのログを確認する: docker logs jaegerまたはdocker logs tempo
OTLPエンドポイントをテストする: curl http://localhost:4318/v1/traces -v
Tempoの場合: tempo -config.file=/etc/tempo.yaml -verify-configで設定構文を検証する

ステップ2: アプリケーションのインストルメント（自動インストルメンテーション）

コード変更を最小限に抑えるために、一般的なフレームワーク向けのOpenTelemetry自動インストルメンテーションを使用する。

PythonとFlask：

pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install

# app.py
from flask import Flask
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
# ... (完全な設定はEXAMPLES.mdを参照)

GoとGinフレームワーク：

go get go.opentelemetry.io/otel
go get go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc
go get go.opentelemetry.io/otel/sdk/trace
go get go.opentelemetry.io/contrib/instrumentation/github.com/gin-gonic/gin/otelgin

package main

import (
    "context"
    "github.com/gin-gonic/gin"
    "go.opentelemetry.io/otel"
# ... (完全な設定はEXAMPLES.mdを参照)

Node.jsとExpress：

npm install @opentelemetry/api \
            @opentelemetry/sdk-node \
            @opentelemetry/auto-instrumentations-node \
            @opentelemetry/exporter-trace-otlp-grpc

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
# ... (完全な設定はEXAMPLES.mdを参照)

期待結果： インストルメントされたサービスからのトレースがJaegerのUIまたはGrafanaに表示され、HTTPリクエストが自動的にスパンを作成する。

失敗時：

エクスポーターのエンドポイントがアプリケーションから到達可能であることを確認する
環境変数を確認する: OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4317
デバッグログを有効にする: OTEL_LOG_LEVEL=debug（Python）、OTEL_LOG_LEVEL=DEBUG（Node.js）
単純なスパンでテストする：手動でスパンを作成してエクスポートパイプラインを確認する
OpenTelemetryパッケージ間のバージョン競合を確認する

ステップ3: 手動インストルメンテーションの追加

ビジネスロジック、データベースクエリ、外部呼び出し用のカスタムスパンを作成する。

Pythonの手動スパン：

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def process_order(order_id):
    # Create a span for the entire operation
# ... (完全な設定はEXAMPLES.mdを参照)

Goの手動スパン：

import (
    "context"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
    "go.opentelemetry.io/otel/codes"
    "go.opentelemetry.io/otel/trace"
# ... (完全な設定はEXAMPLES.mdを参照)

スパン属性のベストプラクティス：

セマンティック規約を使用する: http.method、http.status_code、db.system、db.statement
ビジネスコンテキストを追加する: user.id、order.id、product.category
リソース識別子を含める: instance.id、region、availability_zone
エラーを記録する: span.RecordError(err)とspan.SetStatus(codes.Error, message)
重要なマイルストーンにイベントを追加する: span.AddEvent("cache_miss")

期待結果： カスタムスパンがトレースビューに表示され、親子関係が正しく、属性がスパン詳細に表示され、エラーが強調表示される。

失敗時：

コンテキスト伝播を確認する：親スパンのコンテキストが子に渡されること
スパン名が説明的で命名規則に従っていることを確認する
スパンが終了していることを確認する（GoでのGo defer span.End()、Pythonでのwithブロック）
属性タイプを確認する：文字列、int、bool、floatのみ
セマンティック規約を検証する：該当する場合は標準的な属性名を使用する

ステップ4: コンテキスト伝播の実装

サービス境界と非同期操作を越えてトレースコンテキストが流れることを確認する。

HTTP headers伝播（W3Cトレースコンテキスト）：

# Client side (Python with requests)
import requests
from opentelemetry import trace
from opentelemetry.propagate import inject

tracer = trace.get_tracer(__name__)
# ... (完全な設定はEXAMPLES.mdを参照)

// Server side (Go with Gin)
import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
)

# ... (完全な設定はEXAMPLES.mdを参照)

メッセージキューの伝播（Kafka）：

# Producer
from opentelemetry.propagate import inject
from kafka import KafkaProducer

producer = KafkaProducer(bootstrap_servers=['kafka:9092'])

# ... (完全な設定はEXAMPLES.mdを参照)

# Consumer
from opentelemetry.propagate import extract

def process_message(msg):
    # Extract trace context from Kafka headers
    headers = {k: v.decode('utf-8') for k, v in msg.headers}
    ctx = extract(headers)

    # Continue the trace
    with tracer.start_as_current_span("process_order_event", context=ctx):
        order_id = json.loads(msg.value)['order_id']
        handle_order(order_id)

非同期操作（Python asyncio）：

import asyncio
from opentelemetry import trace, context

async def async_operation():
    # Capture current context
    token = context.attach(context.get_current())
    try:
        with tracer.start_as_current_span("async_database_query"):
            await asyncio.sleep(0.1)  # Simulated async work
            return "result"
    finally:
        context.detach(token)

期待結果： トレースが複数のサービスにまたがり、サービス境界を越えてトレースIDが一貫し、親子関係が保持される。

失敗時：

W3Cトレースコンテキストプロパゲーターが設定されていることを確認する: otel.propagation.set_global_textmap(TraceContextTextMapPropagator())
HTTPリクエストにヘッダーが渡されていることを確認する
Kafkaの場合: ブローカーバージョンがヘッダーをサポートしていることを確認する（v0.11以上）
ヘッダーインスペクションでデバッグする：traceparentヘッダーの値をログに記録する
トレースの可視化を使用して切断されたトレースリンクを特定する

ステップ5: サンプリング戦略の設定

可視性を維持しながらトレース量とコストを削減するためにサンプリングを実装する。

サンプリング戦略：

from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import (
    ParentBased,
    TraceIdRatioBased,
    StaticSampler,
    Decision
# ... (完全な設定はEXAMPLES.mdを参照)

Tempoを使用したテールベースサンプリング：

tempo.yamlでの設定：

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]
      storage:
        path: /tmp/tempo/generator/wal
        remote_write:
          - url: http://prometheus:9090/api/v1/write
            send_exemplars: true

    # Tail sampling (requires tempo-query)
    ingestion_rate_limit_bytes: 5000000
    ingestion_burst_size_bytes: 10000000

動的サンプリングにはGrafana TempoのTraceQLを使用する：

# Sample traces with errors
{ status = error }

# Sample slow traces (>1s)
{ duration > 1s }

# Sample specific services
{ resource.service.name = "checkout-service" }

期待結果： トレース量がターゲットのパーセンテージに削減され、エラートレースが常にサンプリングされ、サンプリング決定がトレースメタデータに表示される。

失敗時：

サンプラーがトレーサープロバイダーの初期化前に適用されていることを確認する
エクスポートされたスパンのサンプリング決定属性を確認する
テールサンプリングの場合: 十分なバッファリングを確認する（ingestion_burst_size_bytes）
ドロップされたトレースを監視する: otel_traces_dropped_totalメトリクス
サンプリングレートを検証するために合成的な高ボリュームトラフィックでテストする

ステップ6: トレースをメトリクスとログに相関させる

統合された可観測性のためにトレースをメトリクスとログにリンクする。

ログにトレースIDを追加する（Python）：

import logging
from opentelemetry import trace

# Custom log formatter with trace context
class TraceFormatter(logging.Formatter):
    def format(self, record):
# ... (完全な設定はEXAMPLES.mdを参照)

トレースからメトリクスを生成する（Tempo）：

# tempo.yaml
metrics_generator:
  registry:
    external_labels:
      cluster: production
  storage:
# ... (完全な設定はEXAMPLES.mdを参照)

これにより以下のPrometheusメトリクスが生成される：

traces_service_graph_request_total - サービス間のリクエスト数
traces_span_metrics_duration_seconds - スパン時間のヒストグラム
traces_spanmetrics_calls_total - スパンの呼び出し数

メトリクスからトレースをクエリする（Grafana）：

GrafanaのPrometheusデータソース設定にエグゼンプラーサポートを追加する：

datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    jsonData:
      exemplarTraceIdDestinations:
        - name: trace_id
          datasourceName: Tempo

Grafanaダッシュボードでエグゼンプラーをenableにする：

{
  "fieldConfig": {
    "defaults": {
      "custom": {
        "showExemplars": true
      }
    }
  }
}

期待結果： メトリクスエグゼンプラーをクリックするとトレースが開き、ログにトレースIDが表示され、トレースがログにリンクし、シグナル間での統合されたデバッグが実現する。

失敗時：

Prometheusでエグゼンプラーサポートが有効であることを確認する（v2.26以上が必要）
トレースIDフォーマットが一致していることを確認する（32文字の16進数）
Tempo設定でメトリクスジェネレーターが有効であることを確認する
Tempoからリモートライトエンドポイントにアクセス可能であることを確認する
エグゼンプラークエリをテストする: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) and on() exemplar

バリデーション

よくある落とし穴

コンテキストが伝播されない: ダウンストリームの呼び出しにcontextを渡し忘れるとトレースが切断される。常にコンテキストを明示的に渡す。
スパンが終了しない: defer span.End()（Go）またはwithブロック（Python）の欠如はスパンが開いたままになりメモリリークを引き起こす。
過剰なインストルメンテーション: すべての関数にスパンを作成するとトレースが膨大になる。サービス境界、データベース呼び出し、外部APIに焦点を当てる。
エラーの記録漏れ: span.RecordError()を呼び出さないと貴重なデバッグ情報が失われる。常にスパンにエラーを記録する。
高カーディナリティ属性: 無制限の値（ユーザーID、リクエストボディ）をスパン属性として使用するとストレージの問題が発生する。サンプリングまたは集計ラベルを使用する。
不正なスパンの種類: 誤ったスパンの種類（CLIENT vs SERVER vs INTERNAL）を使用するとサービスグラフの生成に影響する。セマンティック規約に従う。
コンテキスト前のサンプリング: サンプリング決定は親トレースコンテキストを尊重しなければならない。アップストリームのサンプリングを尊重するためにParentBasedサンプラーを使用する。

GitHub 저장소

pjt222/agent-almanac

경로: i18n/ja/skills/instrument-distributed-tracing

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the instrument-distributed-tracing skill?

instrument-distributed-tracing is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform instrument-distributed-tracing-related tasks without extra prompting.

How do I install instrument-distributed-tracing?

Use the install commands on this page: add instrument-distributed-tracing to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does instrument-distributed-tracing belong to?

instrument-distributed-tracing is in the Other category, tagged general.

Is instrument-distributed-tracing free to use?

Yes. instrument-distributed-tracing is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

llamaguard

기타

LlamaGuard는 폭력 및 혐오 발언 등 6가지 안전 범주에서 LLM 입력과 출력을 조정하기 위한 Meta의 70-80억 파라미터 모델입니다. 94-95% 정확도를 제공하며 vLLM, Hugging Face 또는 Amazon SageMaker를 사용해 배포할 수 있습니다. 이 기술을 사용하여 AI 애플리케이션에 콘텐츠 필터링 및 안전 가드레일을 손쉽게 통합하세요.

스킬 보기

cost-optimization

기타

이 Claude Skill은 리소스 적정화, 태깅 전략, 지출 분석을 통해 개발자들이 클라우드 비용을 최적화할 수 있도록 지원합니다. AWS, Azure, GCP에서 클라우드 비용을 절감하고 비용 거버넌스를 구현하기 위한 프레임워크를 제공합니다. 인프라 비용을 분석하거나, 리소스를 적정화하거나, 예산 제약을 충족해야 할 때 사용하세요.

스킬 보기

sports-betting-analyzer

기타

이 Claude Skill은 스프레드, 오버/언더, 프로프 베트를 포함한 스포츠 베팅 시장을 분석합니다. 역사적 추이와 상황별 통계를 검토하여 가치 베트를 발견하고, 교육적 목적으로 실행 가능한 권장 사항이 담긴 구조화된 마크다운 결과를 제공합니다. 개발자는 이 기능을 스포츠 베팅 분석 도구에 활용할 수 있으며, 단순히 엔터테인먼트/교육 목적으로만 설계되었음을 유의해야 합니다.

스킬 보기

quantizing-models-bitsandbytes

기타

이 스킬은 bitsandbytes를 사용하여 LLM을 8비트 또는 4비트 정밀도로 양자화하며, 최소한의 정확도 손실로 50-75%의 메모리 감소를 달성합니다. 제한된 GPU 메모리에서 더 큰 모델을 실행하거나 추론을 가속화하는 데 이상적이며, INT8, NF4, FP4와 같은 형식을 지원합니다. 이 스킬은 HuggingFace Transformers와 통합되어 QLoRA 학습 및 8비트 옵티마이저를 가능하게 합니다.

스킬 보기

instrument-distributed-tracing

정보

빠른 설치

Claude Code

문서

分散トレーシングのインストルメント

使用タイミング

入力

手順

ステップ1: トレーシングバックエンドのセットアップ

ステップ2: アプリケーションのインストルメント（自動インストルメンテーション）

ステップ3: 手動インストルメンテーションの追加

ステップ4: コンテキスト伝播の実装

ステップ5: サンプリング戦略の設定

ステップ6: トレースをメトリクスとログに相関させる

バリデーション

よくある落とし穴

関連スキル

GitHub 저장소

Frequently asked questions

What is the instrument-distributed-tracing skill?

How do I install instrument-distributed-tracing?

What category does instrument-distributed-tracing belong to?

Is instrument-distributed-tracing free to use?

연관 스킬