error-tracking
About
This skill helps developers track and debug production errors using CloudWatch Logs and structured logging. It's designed for investigating errors, monitoring application health, and improving observability. Key capabilities include analyzing error patterns, setting up alerting, and troubleshooting user-reported issues.
Documentation
Error Tracking Skill
This skill helps you track and debug errors in production using CloudWatch Logs and structured logging.
When to Use This Skill
- Investigating production errors
- Monitoring application health
- Debugging intermittent issues
- Analyzing error patterns
- Setting up alerting
- Improving observability
- Troubleshooting user-reported issues
Logging Infrastructure
CloudWatch Logs
AWS Lambda functions automatically log to CloudWatch:
CloudWatch Log Groups:
├── /aws/lambda/sgcarstrends-api-prod
├── /aws/lambda/sgcarstrends-web-prod
└── /aws/lambda/sgcarstrends-workflows-prod
Structured Logging
Logger Setup
// packages/utils/src/logger.ts
import pino from "pino";
export const logger = pino({
level: process.env.LOG_LEVEL || "info",
formatters: {
level: (label) => ({ level: label }),
},
timestamp: pino.stdTimeFunctions.isoTime,
base: {
env: process.env.NODE_ENV,
service: process.env.SERVICE_NAME,
},
});
// Export typed logger methods
export const log = {
info: (message: string, data?: Record<string, unknown>) => {
logger.info(data, message);
},
error: (message: string, error: Error, data?: Record<string, unknown>) => {
logger.error(
{
...data,
error: {
message: error.message,
stack: error.stack,
name: error.name,
},
},
message
);
},
warn: (message: string, data?: Record<string, unknown>) => {
logger.warn(data, message);
},
debug: (message: string, data?: Record<string, unknown>) => {
logger.debug(data, message);
},
};
Usage in Code
// apps/api/src/routes/cars.ts
import { log } from "@sgcarstrends/utils/logger";
export const getCars = async (c: Context) => {
try {
log.info("Fetching cars", {
month: c.req.query("month"),
userId: c.get("userId"),
});
const cars = await db.query.cars.findMany();
log.info("Cars fetched successfully", {
count: cars.length,
});
return c.json(cars);
} catch (error) {
log.error("Failed to fetch cars", error as Error, {
month: c.req.query("month"),
});
return c.json({ error: "Failed to fetch cars" }, 500);
}
};
Viewing Logs
AWS CLI
# View recent logs
aws logs tail /aws/lambda/sgcarstrends-api-prod --follow
# Filter by error level
aws logs tail /aws/lambda/sgcarstrends-api-prod \
--filter-pattern "ERROR"
# View logs from specific time range
aws logs filter-log-events \
--log-group-name /aws/lambda/sgcarstrends-api-prod \
--start-time $(($(date +%s) - 3600))000 \
--end-time $(date +%s)000 \
--filter-pattern "ERROR"
# Search for specific message
aws logs filter-log-events \
--log-group-name /aws/lambda/sgcarstrends-api-prod \
--filter-pattern "Failed to fetch cars"
SST Console
# Open SST console
cd apps/api
sst dev
# View logs in browser
# Navigate to Functions → sgcarstrends-api-prod → Logs
Error Patterns
Common Error Logging
// Database errors
try {
const result = await db.query.cars.findMany();
} catch (error) {
log.error("Database query failed", error as Error, {
query: "cars.findMany",
retryable: true,
});
throw error;
}
// External API errors
try {
const response = await fetch(url);
if (!response.ok) {
log.error("External API error", new Error("API request failed"), {
url,
status: response.status,
statusText: response.statusText,
});
}
} catch (error) {
log.error("External API request failed", error as Error, {
url,
});
}
// Validation errors
const result = schema.safeParse(data);
if (!result.success) {
log.warn("Validation failed", {
errors: result.error.issues,
data,
});
return c.json({ error: "Invalid request" }, 400);
}
// Authentication errors
if (!user) {
log.warn("Unauthorized access attempt", {
path: c.req.path,
ip: c.req.header("x-forwarded-for"),
});
return c.json({ error: "Unauthorized" }, 401);
}
CloudWatch Insights
Query Logs
-- Find all errors in last hour
fields @timestamp, @message, level, error.message
| filter level = "error"
| sort @timestamp desc
| limit 100
-- Count errors by type
fields error.name
| filter level = "error"
| stats count() by error.name
| sort count() desc
-- Find slow requests
fields @timestamp, @message, duration
| filter level = "info" and @message like /Request completed/
| filter duration > 1000
| sort duration desc
-- Track error rate over time
fields @timestamp
| filter level = "error"
| stats count() as ErrorCount by bin(5m)
-- Find errors for specific user
fields @timestamp, @message, userId, error.message
| filter level = "error" and userId = "user123"
| sort @timestamp desc
Common Queries
-- Database connection errors
fields @timestamp, @message, error.message
| filter error.message like /connection/
| sort @timestamp desc
-- Memory errors
fields @timestamp, @message, error.message
| filter error.message like /memory/ or error.message like /heap/
| sort @timestamp desc
-- Timeout errors
fields @timestamp, @message, error.message
| filter error.message like /timeout/ or error.message like /timed out/
| sort @timestamp desc
-- Rate limit errors
fields @timestamp, @message, error.message
| filter error.message like /rate limit/ or error.message like /too many requests/
| sort @timestamp desc
Error Monitoring
CloudWatch Alarms
// infra/monitoring.ts
import { Alarm } from "sst/constructs";
export function Monitoring({ stack }: StackContext) {
// Error rate alarm
new Alarm(stack, "HighErrorRate", {
sns: {
topicArn: process.env.SNS_TOPIC_ARN,
},
alarm: (props) => ({
alarmName: "sgcarstrends-high-error-rate",
evaluationPeriods: 2,
threshold: 10,
comparisonOperator: "GreaterThanThreshold",
metric: new Metric({
namespace: "AWS/Lambda",
metricName: "Errors",
dimensions: {
FunctionName: props.functionName,
},
statistic: "Sum",
period: Duration.minutes(5),
}),
}),
});
// High latency alarm
new Alarm(stack, "HighLatency", {
sns: {
topicArn: process.env.SNS_TOPIC_ARN,
},
alarm: (props) => ({
alarmName: "sgcarstrends-high-latency",
evaluationPeriods: 3,
threshold: 1000, // 1 second
comparisonOperator: "GreaterThanThreshold",
metric: new Metric({
namespace: "AWS/Lambda",
metricName: "Duration",
dimensions: {
FunctionName: props.functionName,
},
statistic: "Average",
period: Duration.minutes(5),
}),
}),
});
}
Error Aggregation
Group Similar Errors
// packages/utils/src/error-tracker.ts
interface ErrorGroup {
fingerprint: string;
message: string;
count: number;
lastSeen: Date;
firstSeen: Date;
}
export class ErrorTracker {
private errors: Map<string, ErrorGroup> = new Map();
track(error: Error, context?: Record<string, unknown>) {
const fingerprint = this.getFingerprint(error);
const existing = this.errors.get(fingerprint);
if (existing) {
existing.count++;
existing.lastSeen = new Date();
} else {
this.errors.set(fingerprint, {
fingerprint,
message: error.message,
count: 1,
lastSeen: new Date(),
firstSeen: new Date(),
});
}
// Log error
log.error("Error tracked", error, {
...context,
fingerprint,
count: this.errors.get(fingerprint)?.count,
});
}
private getFingerprint(error: Error): string {
// Create fingerprint from error type and message
const parts = [
error.name,
error.message.replace(/\d+/g, "N"), // Replace numbers
error.stack?.split("\n")[1], // First stack frame
];
return parts.filter(Boolean).join("|");
}
getTopErrors(limit = 10): ErrorGroup[] {
return Array.from(this.errors.values())
.sort((a, b) => b.count - a.count)
.slice(0, limit);
}
}
Best Practices
1. Log Context
// ❌ No context
log.error("Error occurred", error);
// ✅ With context
log.error("Failed to process payment", error, {
userId: user.id,
amount: payment.amount,
currency: payment.currency,
paymentId: payment.id,
});
2. Use Structured Logs
// ❌ String concatenation
console.log(`User ${userId} performed action ${action}`);
// ✅ Structured logging
log.info("User action", {
userId,
action,
timestamp: new Date().toISOString(),
});
3. Don't Log Sensitive Data
// ❌ Logging sensitive data
log.info("User logged in", {
email: user.email,
password: user.password, // NEVER log passwords!
creditCard: user.creditCard,
});
// ✅ Safe logging
log.info("User logged in", {
userId: user.id,
email: user.email.replace(/(?<=.{2}).(?=.*@)/g, "*"), // Mask email
});
4. Set Appropriate Log Levels
// Production
log.debug("Database query", { query }); // Not logged in prod
log.info("Request completed", { duration }); // Logged
log.warn("Cache miss", { key }); // Logged
log.error("Database error", error); // Logged
// Development
// All levels logged
Debugging Production Issues
Step-by-Step Process
# 1. Identify the issue
# Check CloudWatch Logs for errors
aws logs tail /aws/lambda/sgcarstrends-api-prod --filter-pattern "ERROR"
# 2. Find error pattern
# Search for similar errors
aws logs filter-log-events \
--log-group-name /aws/lambda/sgcarstrends-api-prod \
--filter-pattern "Failed to fetch cars"
# 3. Check error context
# View logs with context
aws logs get-log-events \
--log-group-name /aws/lambda/sgcarstrends-api-prod \
--log-stream-name 2024/01/15/[$LATEST]abc123 \
--start-from-head
# 4. Analyze error frequency
# Use CloudWatch Insights
# Query: Count errors by type
# 5. Reproduce locally
# Use error context to reproduce
# 6. Fix and deploy
# Create fix, test, deploy
# 7. Verify fix
# Monitor logs after deployment
aws logs tail /aws/lambda/sgcarstrends-api-prod --follow
Troubleshooting
Logs Not Appearing
# Issue: Logs not showing in CloudWatch
# Solution: Check Lambda execution role permissions
# Ensure Lambda has CloudWatch Logs permissions:
# - logs:CreateLogGroup
# - logs:CreateLogStream
# - logs:PutLogEvents
Too Many Logs
# Issue: Too much logging causing high costs
# Solution: Adjust log level and retention
# Set log level in production
LOG_LEVEL=info
# Reduce retention period
aws logs put-retention-policy \
--log-group-name /aws/lambda/sgcarstrends-api-prod \
--retention-in-days 7
Cannot Find Specific Error
# Issue: Can't find error in logs
# Solution: Improve search with CloudWatch Insights
# Use more specific filters
fields @timestamp, @message
| filter @message like /specific pattern/
| sort @timestamp desc
References
- AWS CloudWatch Logs: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/
- CloudWatch Insights: https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html
- Pino Logger: https://getpino.io
- Related files:
packages/utils/src/logger.ts- Logger configuration- Root CLAUDE.md - Logging guidelines
Best Practices Summary
- Structured Logging: Use structured logs with context
- Appropriate Levels: Use correct log levels (debug, info, warn, error)
- Don't Log Secrets: Never log sensitive data
- Add Context: Include relevant context for debugging
- Monitor Errors: Set up CloudWatch Alarms
- Aggregate Errors: Group similar errors together
- Log Retention: Set appropriate retention periods
- Use Insights: Leverage CloudWatch Insights for analysis
Quick Install
/plugin add https://github.com/sgcarstrends/sgcarstrends/tree/main/error-trackingCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
subagent-driven-development
DevelopmentThis skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.
algorithmic-art
MetaThis Claude Skill creates original algorithmic art using p5.js with seeded randomness and interactive parameters. It generates .md files for algorithmic philosophies, plus .html and .js files for interactive generative art implementations. Use it when developers need to create flow fields, particle systems, or other computational art while avoiding copyright issues.
executing-plans
DesignUse the executing-plans skill when you have a complete implementation plan to execute in controlled batches with review checkpoints. It loads and critically reviews the plan, then executes tasks in small batches (default 3 tasks) while reporting progress between each batch for architect review. This ensures systematic implementation with built-in quality control checkpoints.
cost-optimization
OtherThis Claude Skill helps developers optimize cloud costs through resource rightsizing, tagging strategies, and spending analysis. It provides a framework for reducing cloud expenses and implementing cost governance across AWS, Azure, and GCP. Use it when you need to analyze infrastructure costs, right-size resources, or meet budget constraints.
