error-tracking

sgcarstrends

Updated Today

22 views

Testinggeneral

About

This skill helps developers track and debug production errors using CloudWatch Logs and structured logging. It's designed for investigating errors, monitoring application health, and improving observability. Key capabilities include analyzing error patterns, setting up alerting, and troubleshooting user-reported issues.

Documentation

Error Tracking Skill

This skill helps you track and debug errors in production using CloudWatch Logs and structured logging.

When to Use This Skill

Investigating production errors
Monitoring application health
Debugging intermittent issues
Analyzing error patterns
Setting up alerting
Improving observability
Troubleshooting user-reported issues

Logging Infrastructure

CloudWatch Logs

AWS Lambda functions automatically log to CloudWatch:

CloudWatch Log Groups:
├── /aws/lambda/sgcarstrends-api-prod
├── /aws/lambda/sgcarstrends-web-prod
└── /aws/lambda/sgcarstrends-workflows-prod

Structured Logging

Logger Setup

// packages/utils/src/logger.ts
import pino from "pino";

export const logger = pino({
  level: process.env.LOG_LEVEL || "info",
  formatters: {
    level: (label) => ({ level: label }),
  },
  timestamp: pino.stdTimeFunctions.isoTime,
  base: {
    env: process.env.NODE_ENV,
    service: process.env.SERVICE_NAME,
  },
});

// Export typed logger methods
export const log = {
  info: (message: string, data?: Record<string, unknown>) => {
    logger.info(data, message);
  },
  error: (message: string, error: Error, data?: Record<string, unknown>) => {
    logger.error(
      {
        ...data,
        error: {
          message: error.message,
          stack: error.stack,
          name: error.name,
        },
      },
      message
    );
  },
  warn: (message: string, data?: Record<string, unknown>) => {
    logger.warn(data, message);
  },
  debug: (message: string, data?: Record<string, unknown>) => {
    logger.debug(data, message);
  },
};

Usage in Code

// apps/api/src/routes/cars.ts
import { log } from "@sgcarstrends/utils/logger";

export const getCars = async (c: Context) => {
  try {
    log.info("Fetching cars", {
      month: c.req.query("month"),
      userId: c.get("userId"),
    });

    const cars = await db.query.cars.findMany();

    log.info("Cars fetched successfully", {
      count: cars.length,
    });

    return c.json(cars);
  } catch (error) {
    log.error("Failed to fetch cars", error as Error, {
      month: c.req.query("month"),
    });

    return c.json({ error: "Failed to fetch cars" }, 500);
  }
};

Viewing Logs

AWS CLI

# View recent logs
aws logs tail /aws/lambda/sgcarstrends-api-prod --follow

# Filter by error level
aws logs tail /aws/lambda/sgcarstrends-api-prod \
  --filter-pattern "ERROR"

# View logs from specific time range
aws logs filter-log-events \
  --log-group-name /aws/lambda/sgcarstrends-api-prod \
  --start-time $(($(date +%s) - 3600))000 \
  --end-time $(date +%s)000 \
  --filter-pattern "ERROR"

# Search for specific message
aws logs filter-log-events \
  --log-group-name /aws/lambda/sgcarstrends-api-prod \
  --filter-pattern "Failed to fetch cars"

SST Console

# Open SST console
cd apps/api
sst dev

# View logs in browser
# Navigate to Functions → sgcarstrends-api-prod → Logs

Error Patterns

Common Error Logging

// Database errors
try {
  const result = await db.query.cars.findMany();
} catch (error) {
  log.error("Database query failed", error as Error, {
    query: "cars.findMany",
    retryable: true,
  });
  throw error;
}

// External API errors
try {
  const response = await fetch(url);
  if (!response.ok) {
    log.error("External API error", new Error("API request failed"), {
      url,
      status: response.status,
      statusText: response.statusText,
    });
  }
} catch (error) {
  log.error("External API request failed", error as Error, {
    url,
  });
}

// Validation errors
const result = schema.safeParse(data);
if (!result.success) {
  log.warn("Validation failed", {
    errors: result.error.issues,
    data,
  });
  return c.json({ error: "Invalid request" }, 400);
}

// Authentication errors
if (!user) {
  log.warn("Unauthorized access attempt", {
    path: c.req.path,
    ip: c.req.header("x-forwarded-for"),
  });
  return c.json({ error: "Unauthorized" }, 401);
}

CloudWatch Insights

Query Logs

-- Find all errors in last hour
fields @timestamp, @message, level, error.message
| filter level = "error"
| sort @timestamp desc
| limit 100

-- Count errors by type
fields error.name
| filter level = "error"
| stats count() by error.name
| sort count() desc

-- Find slow requests
fields @timestamp, @message, duration
| filter level = "info" and @message like /Request completed/
| filter duration > 1000
| sort duration desc

-- Track error rate over time
fields @timestamp
| filter level = "error"
| stats count() as ErrorCount by bin(5m)

-- Find errors for specific user
fields @timestamp, @message, userId, error.message
| filter level = "error" and userId = "user123"
| sort @timestamp desc

Common Queries

-- Database connection errors
fields @timestamp, @message, error.message
| filter error.message like /connection/
| sort @timestamp desc

-- Memory errors
fields @timestamp, @message, error.message
| filter error.message like /memory/ or error.message like /heap/
| sort @timestamp desc

-- Timeout errors
fields @timestamp, @message, error.message
| filter error.message like /timeout/ or error.message like /timed out/
| sort @timestamp desc

-- Rate limit errors
fields @timestamp, @message, error.message
| filter error.message like /rate limit/ or error.message like /too many requests/
| sort @timestamp desc

Error Monitoring

CloudWatch Alarms

// infra/monitoring.ts
import { Alarm } from "sst/constructs";

export function Monitoring({ stack }: StackContext) {
  // Error rate alarm
  new Alarm(stack, "HighErrorRate", {
    sns: {
      topicArn: process.env.SNS_TOPIC_ARN,
    },
    alarm: (props) => ({
      alarmName: "sgcarstrends-high-error-rate",
      evaluationPeriods: 2,
      threshold: 10,
      comparisonOperator: "GreaterThanThreshold",
      metric: new Metric({
        namespace: "AWS/Lambda",
        metricName: "Errors",
        dimensions: {
          FunctionName: props.functionName,
        },
        statistic: "Sum",
        period: Duration.minutes(5),
      }),
    }),
  });

  // High latency alarm
  new Alarm(stack, "HighLatency", {
    sns: {
      topicArn: process.env.SNS_TOPIC_ARN,
    },
    alarm: (props) => ({
      alarmName: "sgcarstrends-high-latency",
      evaluationPeriods: 3,
      threshold: 1000,  // 1 second
      comparisonOperator: "GreaterThanThreshold",
      metric: new Metric({
        namespace: "AWS/Lambda",
        metricName: "Duration",
        dimensions: {
          FunctionName: props.functionName,
        },
        statistic: "Average",
        period: Duration.minutes(5),
      }),
    }),
  });
}

Error Aggregation

Group Similar Errors

// packages/utils/src/error-tracker.ts
interface ErrorGroup {
  fingerprint: string;
  message: string;
  count: number;
  lastSeen: Date;
  firstSeen: Date;
}

export class ErrorTracker {
  private errors: Map<string, ErrorGroup> = new Map();

  track(error: Error, context?: Record<string, unknown>) {
    const fingerprint = this.getFingerprint(error);

    const existing = this.errors.get(fingerprint);
    if (existing) {
      existing.count++;
      existing.lastSeen = new Date();
    } else {
      this.errors.set(fingerprint, {
        fingerprint,
        message: error.message,
        count: 1,
        lastSeen: new Date(),
        firstSeen: new Date(),
      });
    }

    // Log error
    log.error("Error tracked", error, {
      ...context,
      fingerprint,
      count: this.errors.get(fingerprint)?.count,
    });
  }

  private getFingerprint(error: Error): string {
    // Create fingerprint from error type and message
    const parts = [
      error.name,
      error.message.replace(/\d+/g, "N"),  // Replace numbers
      error.stack?.split("\n")[1],  // First stack frame
    ];
    return parts.filter(Boolean).join("|");
  }

  getTopErrors(limit = 10): ErrorGroup[] {
    return Array.from(this.errors.values())
      .sort((a, b) => b.count - a.count)
      .slice(0, limit);
  }
}

Best Practices

1. Log Context

// ❌ No context
log.error("Error occurred", error);

// ✅ With context
log.error("Failed to process payment", error, {
  userId: user.id,
  amount: payment.amount,
  currency: payment.currency,
  paymentId: payment.id,
});

2. Use Structured Logs

// ❌ String concatenation
console.log(`User ${userId} performed action ${action}`);

// ✅ Structured logging
log.info("User action", {
  userId,
  action,
  timestamp: new Date().toISOString(),
});

3. Don't Log Sensitive Data

// ❌ Logging sensitive data
log.info("User logged in", {
  email: user.email,
  password: user.password,  // NEVER log passwords!
  creditCard: user.creditCard,
});

// ✅ Safe logging
log.info("User logged in", {
  userId: user.id,
  email: user.email.replace(/(?<=.{2}).(?=.*@)/g, "*"),  // Mask email
});

4. Set Appropriate Log Levels

// Production
log.debug("Database query", { query });  // Not logged in prod
log.info("Request completed", { duration });  // Logged
log.warn("Cache miss", { key });  // Logged
log.error("Database error", error);  // Logged

// Development
// All levels logged

Debugging Production Issues

Step-by-Step Process

# 1. Identify the issue
# Check CloudWatch Logs for errors
aws logs tail /aws/lambda/sgcarstrends-api-prod --filter-pattern "ERROR"

# 2. Find error pattern
# Search for similar errors
aws logs filter-log-events \
  --log-group-name /aws/lambda/sgcarstrends-api-prod \
  --filter-pattern "Failed to fetch cars"

# 3. Check error context
# View logs with context
aws logs get-log-events \
  --log-group-name /aws/lambda/sgcarstrends-api-prod \
  --log-stream-name 2024/01/15/[$LATEST]abc123 \
  --start-from-head

# 4. Analyze error frequency
# Use CloudWatch Insights
# Query: Count errors by type

# 5. Reproduce locally
# Use error context to reproduce

# 6. Fix and deploy
# Create fix, test, deploy

# 7. Verify fix
# Monitor logs after deployment
aws logs tail /aws/lambda/sgcarstrends-api-prod --follow

Troubleshooting

Logs Not Appearing

# Issue: Logs not showing in CloudWatch
# Solution: Check Lambda execution role permissions

# Ensure Lambda has CloudWatch Logs permissions:
# - logs:CreateLogGroup
# - logs:CreateLogStream
# - logs:PutLogEvents

Too Many Logs

# Issue: Too much logging causing high costs
# Solution: Adjust log level and retention

# Set log level in production
LOG_LEVEL=info

# Reduce retention period
aws logs put-retention-policy \
  --log-group-name /aws/lambda/sgcarstrends-api-prod \
  --retention-in-days 7

Cannot Find Specific Error

# Issue: Can't find error in logs
# Solution: Improve search with CloudWatch Insights

# Use more specific filters
fields @timestamp, @message
| filter @message like /specific pattern/
| sort @timestamp desc

References

AWS CloudWatch Logs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/
CloudWatch Insights: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html
Pino Logger: https://getpino.io
Related files:
- packages/utils/src/logger.ts - Logger configuration
- Root CLAUDE.md - Logging guidelines

Best Practices Summary

Structured Logging: Use structured logs with context
Appropriate Levels: Use correct log levels (debug, info, warn, error)
Don't Log Secrets: Never log sensitive data
Add Context: Include relevant context for debugging
Monitor Errors: Set up CloudWatch Alarms
Aggregate Errors: Group similar errors together
Log Retention: Set appropriate retention periods
Use Insights: Leverage CloudWatch Insights for analysis

Quick Install

/plugin add https://github.com/sgcarstrends/sgcarstrends/tree/main/error-tracking

Copy and paste this command in Claude Code to install this skill

GitHub 仓库

sgcarstrends/sgcarstrends

Path: .claude/skills/error-tracking

apiaws-lambdabackendhonojob-schedulerneon-postgres

Related Skills

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

algorithmic-art

executing-plans

Design

Use the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.

View skill

cost-optimization

Other

This Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.

View skill