observability-monitoring

greyhaven-ai

Updated Yesterday

33 views

Testingaidata

About

This skill implements observability and monitoring for Cloudflare Workers using built-in tools. It provides logging through wrangler tail, metrics via Workers Analytics, and health checks for endpoint monitoring. Use it when setting up monitoring, configuring alerts, or debugging production issues in your Workers applications.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/greyhaven-ai/claude-code-config

Git CloneAlternative

git clone https://github.com/greyhaven-ai/claude-code-config.git ~/.claude/skills/observability-monitoring

Copy and paste this command in Claude Code to install this skill

Documentation

Grey Haven Observability and Monitoring

Implement comprehensive monitoring for Grey Haven applications using Cloudflare Workers built-in observability tools.

Observability Stack

Grey Haven Monitoring Architecture

Logging: Cloudflare Workers logs + wrangler tail
Metrics: Cloudflare Workers Analytics dashboard
Custom Events: Cloudflare Analytics Engine
Health Checks: Cloudflare Health Checks for endpoint availability
Error Tracking: Console errors visible in Cloudflare dashboard

Cloudflare Workers Logging

Console Logging in Workers

// app/utils/logger.ts
export interface LogEvent {
  level: "debug" | "info" | "warn" | "error";
  message: string;
  context?: Record<string, unknown>;
  userId?: string;
  tenantId?: string;
  requestId?: string;
  duration?: number;
}

export function log(event: LogEvent) {
  const logData = {
    timestamp: new Date().toISOString(),
    level: event.level,
    message: event.message,
    environment: process.env.ENVIRONMENT,
    user_id: event.userId,
    tenant_id: event.tenantId,
    request_id: event.requestId,
    duration_ms: event.duration,
    ...event.context,
  };

  // Structured console logging (visible in Cloudflare dashboard)
  console[event.level](JSON.stringify(logData));
}

// Convenience methods
export const logger = {
  debug: (message: string, context?: Record<string, unknown>) =>
    log({ level: "debug", message, context }),
  info: (message: string, context?: Record<string, unknown>) =>
    log({ level: "info", message, context }),
  warn: (message: string, context?: Record<string, unknown>) =>
    log({ level: "warn", message, context }),
  error: (message: string, context?: Record<string, unknown>) =>
    log({ level: "error", message, context }),
};

Logging Middleware

// app/middleware/logging.ts
import { logger } from "~/utils/logger";
import { v4 as uuidv4 } from "uuid";

export async function loggingMiddleware(
  request: Request,
  next: () => Promise<Response>
) {
  const requestId = uuidv4();
  const startTime = Date.now();

  try {
    const response = await next();
    const duration = Date.now() - startTime;

    logger.info("Request completed", {
      request_id: requestId,
      method: request.method,
      url: request.url,
      status: response.status,
      duration_ms: duration,
    });

    return response;
  } catch (error) {
    const duration = Date.now() - startTime;

    logger.error("Request failed", {
      request_id: requestId,
      method: request.method,
      url: request.url,
      error: error.message,
      stack: error.stack,
      duration_ms: duration,
    });

    throw error;
  }
}

Cloudflare Workers Analytics

Workers Analytics Dashboard

Access metrics at: https://dash.cloudflare.com → Workers → Analytics

Key Metrics:

Request rate (requests/second)
CPU time (milliseconds)
Error rate (%)
Success rate (%)
Response time (P50, P95, P99)
Invocations per day
GB-seconds (compute usage)

Wrangler Tail (Real-time Logs)

# Stream production logs
npx wrangler tail --config wrangler.production.toml

# Filter by status code
npx wrangler tail --status error --config wrangler.production.toml

# Filter by method
npx wrangler tail --method POST --config wrangler.production.toml

# Filter by IP address
npx wrangler tail --ip 1.2.3.4 --config wrangler.production.toml

# Output to file
npx wrangler tail --config wrangler.production.toml > logs.txt

Accessing Logs in Cloudflare Dashboard

Go to https://dash.cloudflare.com
Navigate to Workers & Pages
Select your Worker
Click "Logs" tab
View real-time logs with filtering

Log Features:

Real-time streaming
Filter by status code
Filter by request method
Search log content
Export logs (JSON)

Analytics Engine (Custom Events)

Setup Analytics Engine

wrangler.toml:

[[analytics_engine_datasets]]
binding = "ANALYTICS"

Track Custom Events

// app/utils/analytics.ts
export async function trackEvent(
  env: Env,
  eventName: string,
  data: {
    user_id?: string;
    tenant_id?: string;
    duration_ms?: number;
    [key: string]: string | number | undefined;
  }
) {
  try {
    await env.ANALYTICS.writeDataPoint({
      blobs: [eventName],
      doubles: [data.duration_ms || 0],
      indexes: [data.user_id || "", data.tenant_id || ""],
    });
  } catch (error) {
    console.error("Failed to track event:", error);
  }
}

// Usage in server function
export const loginUser = createServerFn({ method: "POST" }).handler(
  async ({ data, context }) => {
    const startTime = Date.now();
    const user = await authenticateUser(data);
    const duration = Date.now() - startTime;

    // Track login event
    await trackEvent(context.env, "user_login", {
      user_id: user.id,
      tenant_id: user.tenantId,
      duration_ms: duration,
    });

    return user;
  }
);

Query Analytics Data

Use Cloudflare GraphQL API:

query GetLoginStats {
  viewer {
    accounts(filter: { accountTag: $accountId }) {
      workersAnalyticsEngineDataset(dataset: "my_analytics") {
        query(
          filter: {
            blob1: "user_login"
            datetime_gt: "2025-01-01T00:00:00Z"
          }
        ) {
          count
          dimensions {
            blob1  # event name
            index1 # user_id
            index2 # tenant_id
          }
        }
      }
    }
  }
}

Health Checks

Health Check Endpoint

// app/routes/api/health.ts
import { createServerFn } from "@tanstack/start";
import { db } from "~/lib/server/db";

export const GET = createServerFn({ method: "GET" }).handler(async ({ context }) => {
  const startTime = Date.now();
  const checks: Record<string, string> = {};

  // Check database
  let dbHealthy = false;
  try {
    await db.execute("SELECT 1");
    dbHealthy = true;
    checks.database = "ok";
  } catch (error) {
    console.error("Database health check failed:", error);
    checks.database = "failed";
  }

  // Check Redis (if using Upstash)
  let redisHealthy = false;
  if (context.env.REDIS) {
    try {
      await context.env.REDIS.ping();
      redisHealthy = true;
      checks.redis = "ok";
    } catch (error) {
      console.error("Redis health check failed:", error);
      checks.redis = "failed";
    }
  }

  const duration = Date.now() - startTime;
  const healthy = dbHealthy && (!context.env.REDIS || redisHealthy);

  return new Response(
    JSON.stringify({
      status: healthy ? "healthy" : "unhealthy",
      checks,
      duration_ms: duration,
      timestamp: new Date().toISOString(),
      environment: process.env.ENVIRONMENT,
    }),
    {
      status: healthy ? 200 : 503,
      headers: { "Content-Type": "application/json" },
    }
  );
});

Cloudflare Health Checks

Configure in Cloudflare dashboard:

Go to Traffic → Health Checks
Create health check for /api/health
Configure:
- Interval: 60 seconds
- Timeout: 5 seconds
- Retries: 2
- Expected status: 200
Set up notifications (email/webhook)

Error Tracking

Structured Error Logging

// app/utils/error-handler.ts
import { logger } from "~/utils/logger";

export function handleError(error: Error, context?: Record<string, unknown>) {
  // Log error with full context
  logger.error(error.message, {
    error_name: error.name,
    stack: error.stack,
    ...context,
  });

  // Also log to Analytics Engine for tracking
  if (context?.env) {
    trackEvent(context.env as Env, "error_occurred", {
      error_name: error.name,
      error_message: error.message,
    });
  }
}

// Usage in server function
export const updateUser = createServerFn({ method: "POST" }).handler(
  async ({ data, context }) => {
    try {
      return await userService.update(data);
    } catch (error) {
      handleError(error, {
        user_id: context.user?.id,
        tenant_id: context.tenant?.id,
        env: context.env,
      });
      throw error;
    }
  }
);

Viewing Errors in Cloudflare

Workers Dashboard: View errors in real-time
Wrangler Tail: npx wrangler tail --status error
Analytics: Check error rate metrics
Health Checks: Monitor endpoint failures

Supporting Documentation

All supporting files are under 500 lines per Anthropic best practices:

examples/ - Complete monitoring examples
- cloudflare-logging.md - Structured console logging
- wrangler-tail.md - Real-time log streaming
- analytics-engine.md - Custom event tracking
- health-checks.md - Health check implementations
- error-tracking.md - Error handling patterns
- INDEX.md - Examples navigation
reference/ - Monitoring references
- cloudflare-metrics.md - Available metrics
- wrangler-commands.md - Wrangler CLI reference
- alert-configuration.md - Setting up alerts
- INDEX.md - Reference navigation
templates/ - Copy-paste ready templates
- logger.ts - Cloudflare logger template
- health-check.ts - Health check endpoint
checklists/ - Monitoring checklists
- observability-setup-checklist.md - Setup checklist

When to Apply This Skill

Use this skill when:

Setting up monitoring for new Cloudflare Workers projects
Implementing structured logging with console
Debugging production issues with wrangler tail
Setting up health checks
Implementing custom metrics tracking with Analytics Engine
Configuring Cloudflare alerts

Template Reference

These patterns are from Grey Haven's production monitoring:

Cloudflare Workers Analytics: Request and performance metrics
Wrangler tail: Real-time log streaming
Console logging: Structured JSON logs
Analytics Engine: Custom event tracking

Critical Reminders

Structured logging: Use JSON.stringify for console logs
Request IDs: Track requests with UUIDs for debugging
Error context: Include tenant_id, user_id in all error logs
Health checks: Monitor database and external service connections
Wrangler tail: Use filters to narrow down logs (--status, --method)
Performance: Track duration_ms for all operations
Environment: Log environment in all messages for filtering
Analytics Engine: Use for custom metrics and event tracking
Dashboard access: Logs available in Cloudflare Workers dashboard
Real-time debugging: Use wrangler tail for live production debugging

GitHub Repository

greyhaven-ai/claude-code-config

Path: grey-haven-plugins/observability/skills/observability-monitoring

Related Skills

sglang

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

content-collections

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill