返回技能列表

forecast-operational-metrics

pjt222
更新于 2 days ago
7 次查看
17
2
17
在 GitHub 上查看
设计design

关于

This skill forecasts infrastructure and application metrics like CPU and memory using Prophet or statsmodels for capacity planning and cost optimization. It enables visualizing predictions in Grafana and setting alerts for projected resource exhaustion. Use it when planning hardware procurement, optimizing cloud spending, or establishing proactive scaling policies based on predicted load.

快速安装

Claude Code

推荐
主要方式
npx skills add pjt222/agent-almanac -a claude-code
插件命令备选方式
/plugin add https://github.com/pjt222/agent-almanac
Git 克隆备选方式
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/forecast-operational-metrics

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Forecast Operational Metrics

Predict future resource usage + system metrics for capacity plan + cost optimization.

See Extended Examples for complete configuration files and templates.

Use When

  • Forecast infra capacity (CPU, memory, disk, net)
  • Plan hardware/cloud procurement next quarter
  • Predict cost trends + optimize cloud spending
  • Setup proactive scaling policies on predicted load
  • Forecast user traffic for event planning
  • Predict DB storage growth for backup planning
  • Estimate API usage for rate limiting config

In

  • Required: Historical time series (3-12mo min)
  • Required: Metric type (CPU, memory, req/sec, costs, etc.)
  • Required: Forecast horizon (days, weeks, months)
  • Optional: Known future events (deployments, campaigns, holidays)
  • Optional: Seasonality (daily, weekly, yearly)
  • Optional: External regressors (marketing spend, signups)

Do

Step 1: Setup + Load Data

Install libs + prep time series.

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install forecasting libraries
pip install prophet statsmodels pandas numpy
pip install plotly matplotlib seaborn
pip install prometheus-api-client influxdb-client
pip install grafana-api

Load + prep w/ MetricsLoader:

# forecasting/data_loader.py (abbreviated)
import pandas as pd
from datetime import datetime, timedelta

class MetricsLoader:
    def load_from_prometheus(self, query: str, lookback_days: int = 90, step: str = "1h"):
        """Load historical metrics from Prometheus."""
        # ... implementation (see EXAMPLES.md for complete code)

    def resample_and_aggregate(self, df: pd.DataFrame, freq: str = "1H"):
        """Resample time series to regular intervals."""
        # ... implementation (see EXAMPLES.md)

# Example usage
loader = MetricsLoader(prometheus_url="http://prometheus:9090")
df = loader.load_from_prometheus(
    query='avg(rate(container_cpu_usage_seconds_total[5m]))',
    lookback_days=90,
)
df_daily = loader.resample_and_aggregate(df, freq="1D")

See EXAMPLES.md Step 1 for complete MetricsLoader.

→ Time series loaded regular intervals, missing filled, ready forecast.

If err: gaps → forward-fill or interpolate, ensure lookback ≥90 days, verify tz consistency, check outliers (>5 sigma) skewing forecasts.

Step 2: Prophet Forecasting

FB Prophet for auto seasonality detection + forecasting.

# forecasting/prophet_forecaster.py (abbreviated)
from prophet import Prophet

class ProphetForecaster:
    def __init__(self, growth: str = "linear", seasonality_mode: str = "multiplicative"):
        self.growth = growth
        self.prophet_params = {
            "growth": growth,
            "seasonality_mode": seasonality_mode,
            # ... additional parameters (see EXAMPLES.md)
        }

    def fit(self, df: pd.DataFrame, regressors=None, holidays=None):
        """Train Prophet model on historical data."""
        # ... implementation (see EXAMPLES.md)

    def forecast(self, periods: int, freq: str = "D"):
        """Generate forecast for future periods."""
        # ... implementation (see EXAMPLES.md)

# Example usage
forecaster = ProphetForecaster(growth="linear", seasonality_mode="multiplicative")
forecaster.fit(df_daily)
forecast = forecaster.forecast(periods=30, freq="D")
forecaster.plot_forecast(forecast, save_path="results/cpu_forecast.png")

See EXAMPLES.md Step 2 for complete ProphetForecaster.

→ Forecast 30+ days w/ CI, seasonal patterns in components plot, cross-validation MAPE < 15%.

If err: unrealistic → try diff growth (linear vs logistic), seasonality missing → adjust seasonality_mode, poor accuracy (<70% MAPE) → more data or external regressors, check data quality.

Step 3: ARIMA/SARIMAX (Alternative)

Statsmodels for traditional time series.

# forecasting/arima_forecaster.py (abbreviated)
from statsmodels.tsa.statespace.sarimax import SARIMAX

class ARIMAForecaster:
    def __init__(self, order: tuple = (1, 1, 1), seasonal_order: tuple = (1, 1, 1, 7)):
        self.order = order
        self.seasonal_order = seasonal_order

    def fit(self, df: pd.DataFrame, exog=None):
        """Train SARIMAX model."""
        series = df.set_index("timestamp")["value"]
        self.model = SARIMAX(series, exog=exog, order=self.order, seasonal_order=self.seasonal_order)
        self.fitted_model = self.model.fit(disp=False)
        # ... implementation (see EXAMPLES.md)

    def forecast(self, steps: int, exog_future=None):
        """Generate forecast for future periods."""
        # ... implementation (see EXAMPLES.md)

# Auto-select parameters
best_order, best_seasonal = auto_arima(series, seasonal=True)
forecaster = ARIMAForecaster(order=best_order, seasonal_order=best_seasonal)
forecaster.fit(df_hourly)
forecast = forecaster.forecast(steps=168)  # 7 days

See EXAMPLES.md Step 3 for complete ARIMAForecaster + auto_arima.

→ ARIMA fit optimal params, forecast w/ CI, diagnostic plots show white noise residuals.

If err: no convergence → simplify params (reduce p, q, P, Q), wrong trend → check differencing (d, D), residuals not white noise → add more AR/MA, ensure series length >2x seasonal period.

Step 4: Capacity Thresholds + Alerts

Analyze forecast → predict exhaustion.

# forecasting/capacity_planning.py (abbreviated)
from datetime import datetime

class CapacityPlanner:
    def __init__(self, capacity_limit: float, warning_threshold: float = 0.8):
        self.capacity_limit = capacity_limit
        self.warning_threshold = warning_threshold

    def find_exhaustion_date(self, forecast: pd.DataFrame):
        """Find when forecast exceeds capacity limit."""
        exceeded = forecast[forecast["yhat"] >= self.capacity_limit]
        # ... implementation (see EXAMPLES.md)

    def generate_capacity_report(self, forecast: pd.DataFrame):
        """Generate comprehensive capacity planning report."""
        # ... implementation (see EXAMPLES.md)

# Example usage
planner = CapacityPlanner(capacity_limit=1000, warning_threshold=0.8)
report = planner.generate_capacity_report(forecast)
print(f"Warning Date: {report['warning_date']}")
print(f"Exhaustion Date: {report['exhaustion_date']}")
recommendation = planner.recommend_scaling_action(report)

See EXAMPLES.md Step 4 for complete CapacityPlanner.

→ Report shows when limits reached, recommendations w/ urgency levels, growth rates.

If err: unrealistic exhaustion date → verify capacity_limit correct, growth too high → check outliers, non-linear growth models for mature systems.

Step 5: Grafana Visualization

Push forecast data → Grafana real-time monitoring.

# forecasting/grafana_integration.py (abbreviated)
import requests

class GrafanaForecaster:
    def __init__(self, grafana_url: str, api_key: str, dashboard_uid: str = None):
        self.grafana_url = grafana_url.rstrip("/")
        self.api_key = api_key
        self.dashboard_uid = dashboard_uid

    def create_annotation(self, text: str, tags: list, time: datetime = None):
        """Create annotation in Grafana for forecast events."""
        # ... implementation (see EXAMPLES.md)

    def create_capacity_alert_annotation(self, capacity_report: dict):
        """Create Grafana annotation for capacity warnings."""
        # ... implementation (see EXAMPLES.md)

# Export to CSV for Grafana datasource
def export_forecast_to_csv(forecast: pd.DataFrame, output_path: str):
    """Export forecast in format compatible with Grafana CSV datasource."""
    # ... implementation (see EXAMPLES.md)

# Example usage
grafana = GrafanaForecaster(
    grafana_url="http://grafana:3000",
    api_key="YOUR_API_KEY",
    dashboard_uid="your-dashboard-uid",
)
grafana.create_capacity_alert_annotation(report)
export_forecast_to_csv(forecast, "grafana/forecasts/cpu_forecast.csv")

See EXAMPLES.md Step 5 for complete GrafanaForecaster.

→ Annotations in dashboards, capacity warnings visible as vertical markers, forecast accessible via CSV datasource.

If err: verify API key perms, check dashboard UID correct, ensure timestamps ms for annotations, test API w/ curl before integrating.

Step 6: Automate Generation

Scheduled jobs → forecasts regularly.

# forecasting/scheduler.py (abbreviated)
import schedule
import time

def generate_daily_forecast():
    """Generate forecast for all monitored metrics."""
    logger.info("Starting daily forecast generation")

    metrics_config = [
        {"name": "cpu_usage", "query": "...", "capacity_limit": 0.8, "forecast_days": 30},
        {"name": "memory_usage", "query": "...", "capacity_limit": 32, "forecast_days": 30},
        {"name": "disk_usage", "query": "...", "capacity_limit": 500, "forecast_days": 90},
    ]

    loader = MetricsLoader(prometheus_url="http://prometheus:9090")

    for metric_config in metrics_config:
        df = loader.load_from_prometheus(query=metric_config["query"], lookback_days=90)
        forecaster = ProphetForecaster()
        forecaster.fit(df)
        forecast = forecaster.forecast(periods=metric_config["forecast_days"])

        planner = CapacityPlanner(capacity_limit=metric_config["capacity_limit"])
        report = planner.generate_capacity_report(forecast)

        export_forecast_to_csv(forecast, f"grafana/forecasts/{metric_config['name']}_forecast.csv")
        # ... (see EXAMPLES.md for complete implementation)

# Schedule daily at 2 AM
schedule.every().day.at("02:00").do(generate_daily_forecast)

while True:
    schedule.run_pending()
    time.sleep(60)

See EXAMPLES.md Step 6 for complete scheduler.

→ Forecasts daily all metrics, capacity reports logged, CSV exported, alerts sent critical warnings.

If err: verify scheduler runs continuously (systemd/supervisor), check Prometheus connectivity, ensure sufficient disk, retry logic for transient failures, monitor scheduler itself.

Check

  • Historical data ≥90 days continuous
  • Prophet captures daily/weekly seasonality in components
  • Forecast CI contains 85-95% actual in validation
  • Capacity exhaustion correct known scenarios
  • ARIMA residuals white noise in diagnostic
  • Grafana annotations at predicted warning/exhaustion
  • Automated daily w/o manual intervention
  • Forecast accuracy (MAPE) < 15% validation

Traps

  • Insufficient data: Need 3-12mo reliable seasonality. Avoid <60 days.
  • Ignore known events: Holidays, deployments, campaigns skew → add as external regressors or holidays.
  • Overconfidence long-term: Accuracy degrades beyond 30-90 days. Directional guidance not exact.
  • Static capacity: Infra changes. Update capacity_limit when adding.
  • Forecast anomalies: Outliers propagate. Clean data or robust methods.
  • Not updating models: Stale after system changes. Retrain weekly or after significant arch.
  • Ignore CI: Point forecasts misleading. Always lower/upper bounds for planning.
  • Wrong seasonality period: Daily for hourly, weekly for daily. Mismatch → poor forecasts.

  • detect-anomalies-aiops — Anomaly detection complements forecasting
  • plan-capacity — Infra capacity planning workflows
  • build-grafana-dashboards — Visualize forecasts + capacity trends

GitHub 仓库

pjt222/agent-almanac
路径: i18n/caveman-ultra/skills/forecast-operational-metrics
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

相关推荐技能

executing-plans

设计

该Skill用于当开发者提供完整实施计划时,以受控批次方式执行代码实现。它会先审阅计划并提出疑问,然后分批次执行任务(默认每批3个任务),并在批次间暂停等待审查。关键特性包括分批次执行、内置检查点和架构师审查机制,确保复杂系统实现的可控性。

查看技能

requesting-code-review

设计

该Skill可在完成任务、实现主要功能或合并代码前自动调度代码审查子代理,确保实现符合需求和计划。它支持通过指定git SHA范围进行精准的代码变更审查,帮助开发者在关键节点及时发现潜在问题。核心原则是"早审查、勤审查",适用于开发流程的各个关键阶段。

查看技能

connect-mcp-server

设计

这个Skill指导开发者如何将MCP服务器连接到Claude Code,支持HTTP、stdio和SSE三种传输协议。它涵盖了从安装配置到认证安全的完整流程,适用于集成GitHub、Notion、数据库等外部服务。当开发者需要添加集成、配置外部工具或提及MCP相关功能时,这个Skill能提供实用的操作指南。

查看技能

web-cli-teleport

设计

该Skill帮助开发者根据任务特性选择Claude Code的Web或CLI界面,并指导如何在两种环境间无缝迁移会话。它能分析任务复杂度、迭代需求等要素,推荐最优工作界面和工作流。关键特性包括会话状态管理、环境切换指导和上下文优化建议。

查看技能