when-profiling-performance-use-performance-profiler
关于
This performance profiling skill helps developers identify and optimize application bottlenecks across CPU, memory, I/O, and network dimensions. It provides comprehensive analysis through baseline measurement, bottleneck detection, and root cause investigation phases. Use this skill when you need systematic performance optimization with tools like perf, Instruments, or clinic.js integration.
快速安装
Claude Code
推荐/plugin add https://github.com/DNYoussef/ai-chrome-extensiongit clone https://github.com/DNYoussef/ai-chrome-extension.git ~/.claude/skills/when-profiling-performance-use-performance-profiler在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Performance Profiler Skill
Overview
When profiling performance, use performance-profiler to measure, analyze, and optimize application performance across CPU, memory, I/O, and network dimensions.
MECE Breakdown
Mutually Exclusive Components:
- Baseline Phase: Establish current performance metrics
- Detection Phase: Identify bottlenecks and hot paths
- Analysis Phase: Root cause analysis and impact assessment
- Optimization Phase: Generate and prioritize recommendations
- Implementation Phase: Apply optimizations with agent assistance
- Validation Phase: Benchmark improvements and verify gains
Collectively Exhaustive Coverage:
- CPU Profiling: Function execution time, hot paths, call graphs
- Memory Profiling: Heap usage, allocations, leaks, garbage collection
- I/O Profiling: File system, database, network latency
- Network Profiling: Request timing, bandwidth, connection pooling
- Concurrency: Thread utilization, lock contention, async operations
- Algorithm Analysis: Time complexity, space complexity
- Cache Analysis: Hit rates, cache misses, invalidation patterns
- Database: Query performance, N+1 problems, index usage
Features
Core Capabilities:
- Multi-dimensional performance profiling (CPU, memory, I/O, network)
- Automated bottleneck detection with prioritization
- Real-time profiling and historical analysis
- Flame graph generation for visual analysis
- Memory leak detection and heap snapshots
- Database query optimization
- Algorithmic complexity analysis
- A/B comparison of before/after optimizations
- Production-safe profiling with minimal overhead
- Integration with APM tools (New Relic, DataDog, etc.)
Profiling Modes:
- Quick Scan: 30-second lightweight profiling
- Standard: 5-minute comprehensive analysis
- Deep: 30-minute detailed investigation
- Continuous: Long-running production monitoring
- Stress Test: Load-based profiling under high traffic
Usage
Slash Command:
/profile [path] [--mode quick|standard|deep] [--target cpu|memory|io|network|all]
Subagent Invocation:
Task("Performance Profiler", "Profile ./app with deep CPU and memory analysis", "performance-analyzer")
MCP Tool:
mcp__performance-profiler__analyze({
project_path: "./app",
profiling_mode: "standard",
targets: ["cpu", "memory", "io"],
generate_optimizations: true
})
Architecture
Phase 1: Baseline Measurement
- Establish current performance metrics
- Define performance budgets
- Set up monitoring infrastructure
- Capture baseline snapshots
Phase 2: Bottleneck Detection
- CPU profiling (sampling or instrumentation)
- Memory profiling (heap analysis)
- I/O profiling (syscall tracing)
- Network profiling (packet analysis)
- Database profiling (query logs)
Phase 3: Root Cause Analysis
- Correlate metrics across dimensions
- Identify causal relationships
- Calculate performance impact
- Prioritize issues by severity
Phase 4: Optimization Generation
- Algorithmic improvements
- Caching strategies
- Parallelization opportunities
- Database query optimization
- Memory optimization
- Network optimization
Phase 5: Implementation
- Generate optimized code with coder agent
- Apply database optimizations
- Configure caching layers
- Implement parallelization
Phase 6: Validation
- Run benchmark suite
- Compare before/after metrics
- Verify no regressions
- Generate performance report
Output Formats
Performance Report:
{
"project": "my-app",
"profiling_mode": "standard",
"duration_seconds": 300,
"baseline": {
"requests_per_second": 1247,
"avg_response_time_ms": 123,
"p95_response_time_ms": 456,
"p99_response_time_ms": 789,
"cpu_usage_percent": 67,
"memory_usage_mb": 512,
"error_rate_percent": 0.1
},
"bottlenecks": [
{
"type": "cpu",
"severity": "high",
"function": "processData",
"time_percent": 34.5,
"calls": 123456,
"avg_time_ms": 2.3,
"recommendation": "Optimize algorithm complexity from O(n²) to O(n log n)"
}
],
"optimizations": [...],
"estimated_improvement": {
"throughput_increase": "3.2x",
"latency_reduction": "68%",
"memory_reduction": "45%"
}
}
Flame Graph:
Interactive SVG flame graph showing call stack with time proportions
Heap Snapshot:
Memory allocation breakdown with retention paths
Optimization Report:
Prioritized list of actionable improvements with code examples
Examples
Example 1: Quick CPU Profiling
/profile ./my-app --mode quick --target cpu
Example 2: Deep Memory Analysis
/profile ./my-app --mode deep --target memory --detect-leaks
Example 3: Full Stack Optimization
/profile ./my-app --mode standard --target all --optimize --benchmark
Example 4: Database Query Optimization
/profile ./my-app --mode standard --target io --database --explain-queries
Integration with Claude-Flow
Coordination Pattern:
// Step 1: Initialize profiling swarm
mcp__claude-flow__swarm_init({ topology: "star", maxAgents: 5 })
// Step 2: Spawn specialized agents
[Parallel Execution]:
Task("CPU Profiler", "Profile CPU usage and identify hot paths in ./app", "performance-analyzer")
Task("Memory Profiler", "Analyze heap usage and detect memory leaks", "performance-analyzer")
Task("I/O Profiler", "Profile file system and database operations", "performance-analyzer")
Task("Network Profiler", "Analyze network requests and identify slow endpoints", "performance-analyzer")
Task("Optimizer", "Generate optimization recommendations based on profiling data", "optimizer")
// Step 3: Implementation agent applies optimizations
[Sequential Execution]:
Task("Coder", "Implement recommended optimizations from profiling analysis", "coder")
Task("Benchmarker", "Run benchmark suite and validate improvements", "performance-benchmarker")
Configuration
Default Settings:
{
"profiling": {
"sampling_rate_hz": 99,
"stack_depth": 128,
"include_native_code": false,
"track_allocations": true
},
"thresholds": {
"cpu_hot_path_percent": 10,
"memory_leak_growth_mb": 10,
"slow_query_ms": 100,
"slow_request_ms": 1000
},
"optimization": {
"auto_apply": false,
"require_approval": true,
"run_tests_before": true,
"run_benchmarks_after": true
},
"output": {
"flame_graph": true,
"heap_snapshot": true,
"call_tree": true,
"recommendations": true
}
}
Profiling Techniques
CPU Profiling:
- Sampling: Periodic stack sampling (low overhead)
- Instrumentation: Function entry/exit hooks (accurate but higher overhead)
- Tracing: Event-based profiling
Memory Profiling:
- Heap Snapshots: Point-in-time memory state
- Allocation Tracking: Record all allocations
- Leak Detection: Compare snapshots over time
- GC Analysis: Garbage collection patterns
I/O Profiling:
- Syscall Tracing: Track system calls (strace, dtrace)
- File System: Monitor read/write operations
- Database: Query logging and EXPLAIN ANALYZE
- Network: Packet capture and request timing
Concurrency Profiling:
- Thread Analysis: CPU utilization per thread
- Lock Contention: Identify blocking operations
- Async Operations: Promise/callback timing
Performance Optimization Strategies
Algorithmic:
- Reduce time complexity (O(n²) → O(n log n))
- Use appropriate data structures
- Eliminate unnecessary work
- Memoization and dynamic programming
Caching:
- In-memory caching (Redis, Memcached)
- CDN for static assets
- HTTP caching headers
- Query result caching
Parallelization:
- Multi-threading
- Worker pools
- Async I/O
- Batching operations
Database:
- Add missing indexes
- Optimize queries
- Reduce N+1 queries
- Connection pooling
- Read replicas
Memory:
- Object pooling
- Reduce allocations
- Stream processing
- Compression
Network:
- Connection keep-alive
- HTTP/2 or HTTP/3
- Compression
- Request batching
- Rate limiting
Performance Budgets
Frontend:
- Time to First Byte (TTFB): < 200ms
- First Contentful Paint (FCP): < 1.8s
- Largest Contentful Paint (LCP): < 2.5s
- Time to Interactive (TTI): < 3.8s
- Total Blocking Time (TBT): < 200ms
- Cumulative Layout Shift (CLS): < 0.1
Backend:
- API Response Time (p50): < 100ms
- API Response Time (p95): < 500ms
- API Response Time (p99): < 1000ms
- Throughput: > 1000 req/s
- Error Rate: < 0.1%
- CPU Usage: < 70%
- Memory Usage: < 80%
Database:
- Query Time (p50): < 10ms
- Query Time (p95): < 50ms
- Query Time (p99): < 100ms
- Connection Pool Utilization: < 80%
Best Practices
- Profile production workloads when possible
- Use production-like data volumes
- Profile under realistic load
- Measure multiple times for consistency
- Focus on p95/p99, not just averages
- Optimize bottlenecks in order of impact
- Always benchmark before and after
- Monitor for regressions in CI/CD
- Set up continuous profiling
- Track performance over time
Troubleshooting
Issue: High CPU usage but no obvious hot path
Solution: Check for excessive small function calls, increase sampling rate, or use instrumentation
Issue: Memory grows continuously
Solution: Run heap snapshot comparison to identify leak sources
Issue: Slow database queries
Solution: Use EXPLAIN ANALYZE, check for missing indexes, analyze query plans
Issue: High latency but low CPU
Solution: Profile I/O operations, check for blocking synchronous calls
See Also
- PROCESS.md - Detailed step-by-step profiling workflow
- README.md - Quick start guide
- subagent-performance-profiler.md - Agent implementation details
- slash-command-profile.sh - Command-line interface
- mcp-performance-profiler.json - MCP tool schema
GitHub 仓库
相关推荐技能
when-optimizing-prompts-use-prompt-architect
其他该Skill为开发者提供基于证据的提示词分析与优化框架,帮助解决AI响应质量差、输出不一致等问题。它能识别并消除提示词中的反模式,通过A/B测试验证优化效果。适用于需要创建新提示、重构现有提示或提升AI系统响应质量的开发场景。
deepspeed
设计该Skill为开发者提供DeepSpeed分布式训练的专家指导,涵盖ZeRO优化阶段、流水线并行和混合精度训练等核心功能。它适用于实现DeepSpeed解决方案、调试代码或学习最佳实践的场景。通过该Skill,开发者能快速获得API使用、特性配置和性能优化的专业支持。
performance-analysis
其他该Skill为Claude Flow群组提供全面的性能分析,能自动检测通信、处理和网络等瓶颈。它通过实时监控和性能剖析生成详细报告,并给出AI驱动的优化建议。开发者可快速识别系统性能问题并获得具体改进方案。
when-mapping-dependencies-use-dependency-mapper
其他这个Claude Skill是一个全面的依赖关系映射工具,能自动提取和分析npm、pip、cargo等多种包管理器的依赖关系。它可以帮助开发者可视化依赖图谱、检测循环依赖和安全漏洞,适用于项目审计和架构优化。通过简单的命令即可生成详细的依赖分析报告和可视化图表。
