tracking-application-response-times
About
This skill tracks application response times across APIs, databases, and services using a plugin that calculates key percentiles and metrics. Use it to identify performance bottlenecks, monitor SLOs, and receive alerts for degradation. Trigger it with phrases like "track response times" or "optimize latency" when you need to analyze and improve application performance.
Documentation
Overview
This skill empowers Claude to proactively monitor and improve application performance by tracking response times across various layers. It provides detailed metrics and insights to identify and resolve performance bottlenecks.
How It Works
- Initiate Tracking: The user requests response time tracking.
- Configure Monitoring: The plugin automatically begins monitoring API endpoints, database queries, external service calls, frontend rendering, and background jobs.
- Report Metrics: The plugin generates reports including P50, P95, P99 percentiles, average, and maximum response times.
When to Use This Skill
This skill activates when you need to:
- Identify performance bottlenecks in your application.
- Monitor service level objectives (SLOs) related to response times.
- Receive alerts about performance degradation.
Examples
Example 1: Diagnosing Slow API Endpoint
User request: "Track response times for the user authentication API endpoint."
The skill will:
- Activate the response-time-tracker plugin.
- Monitor the specified API endpoint and report response time metrics, highlighting potential bottlenecks.
Example 2: Monitoring Database Query Performance
User request: "Monitor database query performance for the product catalog."
The skill will:
- Activate the response-time-tracker plugin.
- Track the execution time of database queries related to the product catalog and provide performance insights.
Best Practices
- Granularity: Track response times at a granular level (e.g., individual API endpoints, specific database queries) for more precise insights.
- Alerting: Configure alerts for significant deviations from baseline performance to proactively address potential issues.
- Contextualization: Correlate response time data with other metrics (e.g., CPU usage, memory consumption) to identify root causes.
Integration
This skill can be integrated with other monitoring and alerting tools to provide a comprehensive view of application performance. It can also be used in conjunction with optimization tools to automatically address identified bottlenecks.
Quick Install
/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus/tree/main/response-time-trackerCopy and paste this command in Claude Code to install this skill
GitHub 仓库
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
