Back to Skills

observability-monitoring

greyhaven-ai
Updated Yesterday
33 views
15
2
15
View on GitHub
Testingaidata

About

This skill implements observability and monitoring for Cloudflare Workers using built-in tools. It provides logging through wrangler tail, metrics via Workers Analytics, and health checks for endpoint monitoring. Use it when setting up monitoring, configuring alerts, or debugging production issues in your Workers applications.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/greyhaven-ai/claude-code-config
Git CloneAlternative
git clone https://github.com/greyhaven-ai/claude-code-config.git ~/.claude/skills/observability-monitoring

Copy and paste this command in Claude Code to install this skill

Documentation

Grey Haven Observability and Monitoring

Implement comprehensive monitoring for Grey Haven applications using Cloudflare Workers built-in observability tools.

Observability Stack

Grey Haven Monitoring Architecture

  • Logging: Cloudflare Workers logs + wrangler tail
  • Metrics: Cloudflare Workers Analytics dashboard
  • Custom Events: Cloudflare Analytics Engine
  • Health Checks: Cloudflare Health Checks for endpoint availability
  • Error Tracking: Console errors visible in Cloudflare dashboard

Cloudflare Workers Logging

Console Logging in Workers

// app/utils/logger.ts
export interface LogEvent {
  level: "debug" | "info" | "warn" | "error";
  message: string;
  context?: Record<string, unknown>;
  userId?: string;
  tenantId?: string;
  requestId?: string;
  duration?: number;
}

export function log(event: LogEvent) {
  const logData = {
    timestamp: new Date().toISOString(),
    level: event.level,
    message: event.message,
    environment: process.env.ENVIRONMENT,
    user_id: event.userId,
    tenant_id: event.tenantId,
    request_id: event.requestId,
    duration_ms: event.duration,
    ...event.context,
  };

  // Structured console logging (visible in Cloudflare dashboard)
  console[event.level](JSON.stringify(logData));
}

// Convenience methods
export const logger = {
  debug: (message: string, context?: Record<string, unknown>) =>
    log({ level: "debug", message, context }),
  info: (message: string, context?: Record<string, unknown>) =>
    log({ level: "info", message, context }),
  warn: (message: string, context?: Record<string, unknown>) =>
    log({ level: "warn", message, context }),
  error: (message: string, context?: Record<string, unknown>) =>
    log({ level: "error", message, context }),
};

Logging Middleware

// app/middleware/logging.ts
import { logger } from "~/utils/logger";
import { v4 as uuidv4 } from "uuid";

export async function loggingMiddleware(
  request: Request,
  next: () => Promise<Response>
) {
  const requestId = uuidv4();
  const startTime = Date.now();

  try {
    const response = await next();
    const duration = Date.now() - startTime;

    logger.info("Request completed", {
      request_id: requestId,
      method: request.method,
      url: request.url,
      status: response.status,
      duration_ms: duration,
    });

    return response;
  } catch (error) {
    const duration = Date.now() - startTime;

    logger.error("Request failed", {
      request_id: requestId,
      method: request.method,
      url: request.url,
      error: error.message,
      stack: error.stack,
      duration_ms: duration,
    });

    throw error;
  }
}

Cloudflare Workers Analytics

Workers Analytics Dashboard

Access metrics at: https://dash.cloudflare.com → Workers → Analytics

Key Metrics:

  • Request rate (requests/second)
  • CPU time (milliseconds)
  • Error rate (%)
  • Success rate (%)
  • Response time (P50, P95, P99)
  • Invocations per day
  • GB-seconds (compute usage)

Wrangler Tail (Real-time Logs)

# Stream production logs
npx wrangler tail --config wrangler.production.toml

# Filter by status code
npx wrangler tail --status error --config wrangler.production.toml

# Filter by method
npx wrangler tail --method POST --config wrangler.production.toml

# Filter by IP address
npx wrangler tail --ip 1.2.3.4 --config wrangler.production.toml

# Output to file
npx wrangler tail --config wrangler.production.toml > logs.txt

Accessing Logs in Cloudflare Dashboard

  1. Go to https://dash.cloudflare.com
  2. Navigate to Workers & Pages
  3. Select your Worker
  4. Click "Logs" tab
  5. View real-time logs with filtering

Log Features:

  • Real-time streaming
  • Filter by status code
  • Filter by request method
  • Search log content
  • Export logs (JSON)

Analytics Engine (Custom Events)

Setup Analytics Engine

wrangler.toml:

[[analytics_engine_datasets]]
binding = "ANALYTICS"

Track Custom Events

// app/utils/analytics.ts
export async function trackEvent(
  env: Env,
  eventName: string,
  data: {
    user_id?: string;
    tenant_id?: string;
    duration_ms?: number;
    [key: string]: string | number | undefined;
  }
) {
  try {
    await env.ANALYTICS.writeDataPoint({
      blobs: [eventName],
      doubles: [data.duration_ms || 0],
      indexes: [data.user_id || "", data.tenant_id || ""],
    });
  } catch (error) {
    console.error("Failed to track event:", error);
  }
}

// Usage in server function
export const loginUser = createServerFn({ method: "POST" }).handler(
  async ({ data, context }) => {
    const startTime = Date.now();
    const user = await authenticateUser(data);
    const duration = Date.now() - startTime;

    // Track login event
    await trackEvent(context.env, "user_login", {
      user_id: user.id,
      tenant_id: user.tenantId,
      duration_ms: duration,
    });

    return user;
  }
);

Query Analytics Data

Use Cloudflare GraphQL API:

query GetLoginStats {
  viewer {
    accounts(filter: { accountTag: $accountId }) {
      workersAnalyticsEngineDataset(dataset: "my_analytics") {
        query(
          filter: {
            blob1: "user_login"
            datetime_gt: "2025-01-01T00:00:00Z"
          }
        ) {
          count
          dimensions {
            blob1  # event name
            index1 # user_id
            index2 # tenant_id
          }
        }
      }
    }
  }
}

Health Checks

Health Check Endpoint

// app/routes/api/health.ts
import { createServerFn } from "@tanstack/start";
import { db } from "~/lib/server/db";

export const GET = createServerFn({ method: "GET" }).handler(async ({ context }) => {
  const startTime = Date.now();
  const checks: Record<string, string> = {};

  // Check database
  let dbHealthy = false;
  try {
    await db.execute("SELECT 1");
    dbHealthy = true;
    checks.database = "ok";
  } catch (error) {
    console.error("Database health check failed:", error);
    checks.database = "failed";
  }

  // Check Redis (if using Upstash)
  let redisHealthy = false;
  if (context.env.REDIS) {
    try {
      await context.env.REDIS.ping();
      redisHealthy = true;
      checks.redis = "ok";
    } catch (error) {
      console.error("Redis health check failed:", error);
      checks.redis = "failed";
    }
  }

  const duration = Date.now() - startTime;
  const healthy = dbHealthy && (!context.env.REDIS || redisHealthy);

  return new Response(
    JSON.stringify({
      status: healthy ? "healthy" : "unhealthy",
      checks,
      duration_ms: duration,
      timestamp: new Date().toISOString(),
      environment: process.env.ENVIRONMENT,
    }),
    {
      status: healthy ? 200 : 503,
      headers: { "Content-Type": "application/json" },
    }
  );
});

Cloudflare Health Checks

Configure in Cloudflare dashboard:

  1. Go to Traffic → Health Checks
  2. Create health check for /api/health
  3. Configure:
    • Interval: 60 seconds
    • Timeout: 5 seconds
    • Retries: 2
    • Expected status: 200
  4. Set up notifications (email/webhook)

Error Tracking

Structured Error Logging

// app/utils/error-handler.ts
import { logger } from "~/utils/logger";

export function handleError(error: Error, context?: Record<string, unknown>) {
  // Log error with full context
  logger.error(error.message, {
    error_name: error.name,
    stack: error.stack,
    ...context,
  });

  // Also log to Analytics Engine for tracking
  if (context?.env) {
    trackEvent(context.env as Env, "error_occurred", {
      error_name: error.name,
      error_message: error.message,
    });
  }
}

// Usage in server function
export const updateUser = createServerFn({ method: "POST" }).handler(
  async ({ data, context }) => {
    try {
      return await userService.update(data);
    } catch (error) {
      handleError(error, {
        user_id: context.user?.id,
        tenant_id: context.tenant?.id,
        env: context.env,
      });
      throw error;
    }
  }
);

Viewing Errors in Cloudflare

  1. Workers Dashboard: View errors in real-time
  2. Wrangler Tail: npx wrangler tail --status error
  3. Analytics: Check error rate metrics
  4. Health Checks: Monitor endpoint failures

Supporting Documentation

All supporting files are under 500 lines per Anthropic best practices:

When to Apply This Skill

Use this skill when:

  • Setting up monitoring for new Cloudflare Workers projects
  • Implementing structured logging with console
  • Debugging production issues with wrangler tail
  • Setting up health checks
  • Implementing custom metrics tracking with Analytics Engine
  • Configuring Cloudflare alerts

Template Reference

These patterns are from Grey Haven's production monitoring:

  • Cloudflare Workers Analytics: Request and performance metrics
  • Wrangler tail: Real-time log streaming
  • Console logging: Structured JSON logs
  • Analytics Engine: Custom event tracking

Critical Reminders

  1. Structured logging: Use JSON.stringify for console logs
  2. Request IDs: Track requests with UUIDs for debugging
  3. Error context: Include tenant_id, user_id in all error logs
  4. Health checks: Monitor database and external service connections
  5. Wrangler tail: Use filters to narrow down logs (--status, --method)
  6. Performance: Track duration_ms for all operations
  7. Environment: Log environment in all messages for filtering
  8. Analytics Engine: Use for custom metrics and event tracking
  9. Dashboard access: Logs available in Cloudflare Workers dashboard
  10. Real-time debugging: Use wrangler tail for live production debugging

GitHub Repository

greyhaven-ai/claude-code-config
Path: grey-haven-plugins/observability/skills/observability-monitoring

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

content-collections

Meta

This skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill