creating-alerting-rules
关于
This skill helps developers create intelligent alerting rules for performance monitoring when triggered by phrases like "create alerts" or "set up alerting". It automates defining thresholds, routing, and escalation policies for categories like latency, error rate, and SLO violations. It's designed for SREs and DevOps teams to improve system observability with less manual effort.
快速安装
Claude Code
推荐/plugin add https://github.com/jeremylongshore/claude-code-plugins-plusgit clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/creating-alerting-rules在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Overview
This skill automates the creation of comprehensive alerting rules, reducing the manual effort required for performance monitoring. It guides you through defining alert categories, setting intelligent thresholds, and configuring routing and escalation policies. The skill also helps generate runbooks and establish alert testing procedures.
How It Works
- Identify Alert Category: Determines the type of alert to create (e.g., latency, error rate, resource utilization).
- Define Thresholds: Sets appropriate thresholds to avoid alert fatigue and ensure timely notification of performance issues.
- Configure Routing and Escalation: Establishes routing policies to direct alerts to the appropriate teams and escalation policies for timely response.
- Generate Runbook: Creates a basic runbook with steps to diagnose and resolve the alerted issue.
When to Use This Skill
This skill activates when you need to:
- Implement performance monitoring for a new service.
- Refine existing alerting rules to reduce false positives.
- Create alerts for specific performance metrics, such as latency or error rate.
Examples
Example 1: Setting up Latency Alerts
User request: "create latency alerts for the payment service"
The skill will:
- Prompt for latency thresholds (e.g., warning and critical).
- Configure alerts to trigger when latency exceeds defined thresholds.
Example 2: Creating Error Rate Alerts
User request: "set up alerting for error rate increases in the API gateway"
The skill will:
- Request the baseline error rate and acceptable deviation.
- Configure alerts to trigger when the error rate exceeds the defined deviation from the baseline.
Best Practices
- Threshold Selection: Use historical data and statistical analysis to determine appropriate thresholds that minimize false positives and negatives.
- Alert Routing: Route alerts to the appropriate teams or individuals based on the alert category and severity.
- Runbook Creation: Generate or link to detailed runbooks that provide clear instructions for diagnosing and resolving the alerted issue.
Integration
This skill can be integrated with other Claude Code plugins to automate incident response workflows. For example, it can trigger automated remediation actions or create tickets in an issue tracking system.
GitHub 仓库
相关推荐技能
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
evaluating-llms-harness
测试该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。
langchain
元LangChain是一个用于构建LLM应用程序的框架,支持智能体、链和RAG应用开发。它提供多模型提供商支持、500+工具集成、记忆管理和向量检索等核心功能。开发者可用它快速构建聊天机器人、问答系统和自主代理,适用于从原型验证到生产部署的全流程。
llamaguard
其他LlamaGuard是Meta推出的7-8B参数内容审核模型,专门用于过滤LLM的输入和输出内容。它能检测六大安全风险类别(暴力/仇恨、性内容、武器、违禁品、自残、犯罪计划),准确率达94-95%。开发者可通过HuggingFace、vLLM或Sagemaker快速部署,并能与NeMo Guardrails集成实现自动化安全防护。
