forecast-operational-metrics
À propos
Cette compétence prévoit les métriques d'infrastructure et d'applications, telles que l'utilisation du CPU et de la mémoire, en utilisant Prophet ou statsmodels pour la planification de capacité et l'optimisation des coûts. Elle permet de visualiser les prédictions dans Grafana et de configurer des alertes en cas d'épuisement projeté des ressources. Utilisez-la lors de la planification des achats de matériel, de l'optimisation des dépenses cloud ou de l'établissement de politiques de mise à l'échelle proactive basées sur la charge prévue.
Installation rapide
Claude Code
Recommandénpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/forecast-operational-metricsCopiez et collez cette commande dans Claude Code pour installer cette compétence
Documentation
Forecast Operational Metrics
Predict future resource usage + system metrics for capacity plan + cost optimization.
See Extended Examples for complete configuration files and templates.
Use When
- Forecast infra capacity (CPU, memory, disk, net)
- Plan hardware/cloud procurement next quarter
- Predict cost trends + optimize cloud spending
- Setup proactive scaling policies on predicted load
- Forecast user traffic for event planning
- Predict DB storage growth for backup planning
- Estimate API usage for rate limiting config
In
- Required: Historical time series (3-12mo min)
- Required: Metric type (CPU, memory, req/sec, costs, etc.)
- Required: Forecast horizon (days, weeks, months)
- Optional: Known future events (deployments, campaigns, holidays)
- Optional: Seasonality (daily, weekly, yearly)
- Optional: External regressors (marketing spend, signups)
Do
Step 1: Setup + Load Data
Install libs + prep time series.
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install forecasting libraries
pip install prophet statsmodels pandas numpy
pip install plotly matplotlib seaborn
pip install prometheus-api-client influxdb-client
pip install grafana-api
Load + prep w/ MetricsLoader:
# forecasting/data_loader.py (abbreviated)
import pandas as pd
from datetime import datetime, timedelta
class MetricsLoader:
def load_from_prometheus(self, query: str, lookback_days: int = 90, step: str = "1h"):
"""Load historical metrics from Prometheus."""
# ... implementation (see EXAMPLES.md for complete code)
def resample_and_aggregate(self, df: pd.DataFrame, freq: str = "1H"):
"""Resample time series to regular intervals."""
# ... implementation (see EXAMPLES.md)
# Example usage
loader = MetricsLoader(prometheus_url="http://prometheus:9090")
df = loader.load_from_prometheus(
query='avg(rate(container_cpu_usage_seconds_total[5m]))',
lookback_days=90,
)
df_daily = loader.resample_and_aggregate(df, freq="1D")
See EXAMPLES.md Step 1 for complete MetricsLoader.
→ Time series loaded regular intervals, missing filled, ready forecast.
If err: gaps → forward-fill or interpolate, ensure lookback ≥90 days, verify tz consistency, check outliers (>5 sigma) skewing forecasts.
Step 2: Prophet Forecasting
FB Prophet for auto seasonality detection + forecasting.
# forecasting/prophet_forecaster.py (abbreviated)
from prophet import Prophet
class ProphetForecaster:
def __init__(self, growth: str = "linear", seasonality_mode: str = "multiplicative"):
self.growth = growth
self.prophet_params = {
"growth": growth,
"seasonality_mode": seasonality_mode,
# ... additional parameters (see EXAMPLES.md)
}
def fit(self, df: pd.DataFrame, regressors=None, holidays=None):
"""Train Prophet model on historical data."""
# ... implementation (see EXAMPLES.md)
def forecast(self, periods: int, freq: str = "D"):
"""Generate forecast for future periods."""
# ... implementation (see EXAMPLES.md)
# Example usage
forecaster = ProphetForecaster(growth="linear", seasonality_mode="multiplicative")
forecaster.fit(df_daily)
forecast = forecaster.forecast(periods=30, freq="D")
forecaster.plot_forecast(forecast, save_path="results/cpu_forecast.png")
See EXAMPLES.md Step 2 for complete ProphetForecaster.
→ Forecast 30+ days w/ CI, seasonal patterns in components plot, cross-validation MAPE < 15%.
If err: unrealistic → try diff growth (linear vs logistic), seasonality missing → adjust seasonality_mode, poor accuracy (<70% MAPE) → more data or external regressors, check data quality.
Step 3: ARIMA/SARIMAX (Alternative)
Statsmodels for traditional time series.
# forecasting/arima_forecaster.py (abbreviated)
from statsmodels.tsa.statespace.sarimax import SARIMAX
class ARIMAForecaster:
def __init__(self, order: tuple = (1, 1, 1), seasonal_order: tuple = (1, 1, 1, 7)):
self.order = order
self.seasonal_order = seasonal_order
def fit(self, df: pd.DataFrame, exog=None):
"""Train SARIMAX model."""
series = df.set_index("timestamp")["value"]
self.model = SARIMAX(series, exog=exog, order=self.order, seasonal_order=self.seasonal_order)
self.fitted_model = self.model.fit(disp=False)
# ... implementation (see EXAMPLES.md)
def forecast(self, steps: int, exog_future=None):
"""Generate forecast for future periods."""
# ... implementation (see EXAMPLES.md)
# Auto-select parameters
best_order, best_seasonal = auto_arima(series, seasonal=True)
forecaster = ARIMAForecaster(order=best_order, seasonal_order=best_seasonal)
forecaster.fit(df_hourly)
forecast = forecaster.forecast(steps=168) # 7 days
See EXAMPLES.md Step 3 for complete ARIMAForecaster + auto_arima.
→ ARIMA fit optimal params, forecast w/ CI, diagnostic plots show white noise residuals.
If err: no convergence → simplify params (reduce p, q, P, Q), wrong trend → check differencing (d, D), residuals not white noise → add more AR/MA, ensure series length >2x seasonal period.
Step 4: Capacity Thresholds + Alerts
Analyze forecast → predict exhaustion.
# forecasting/capacity_planning.py (abbreviated)
from datetime import datetime
class CapacityPlanner:
def __init__(self, capacity_limit: float, warning_threshold: float = 0.8):
self.capacity_limit = capacity_limit
self.warning_threshold = warning_threshold
def find_exhaustion_date(self, forecast: pd.DataFrame):
"""Find when forecast exceeds capacity limit."""
exceeded = forecast[forecast["yhat"] >= self.capacity_limit]
# ... implementation (see EXAMPLES.md)
def generate_capacity_report(self, forecast: pd.DataFrame):
"""Generate comprehensive capacity planning report."""
# ... implementation (see EXAMPLES.md)
# Example usage
planner = CapacityPlanner(capacity_limit=1000, warning_threshold=0.8)
report = planner.generate_capacity_report(forecast)
print(f"Warning Date: {report['warning_date']}")
print(f"Exhaustion Date: {report['exhaustion_date']}")
recommendation = planner.recommend_scaling_action(report)
See EXAMPLES.md Step 4 for complete CapacityPlanner.
→ Report shows when limits reached, recommendations w/ urgency levels, growth rates.
If err: unrealistic exhaustion date → verify capacity_limit correct, growth too high → check outliers, non-linear growth models for mature systems.
Step 5: Grafana Visualization
Push forecast data → Grafana real-time monitoring.
# forecasting/grafana_integration.py (abbreviated)
import requests
class GrafanaForecaster:
def __init__(self, grafana_url: str, api_key: str, dashboard_uid: str = None):
self.grafana_url = grafana_url.rstrip("/")
self.api_key = api_key
self.dashboard_uid = dashboard_uid
def create_annotation(self, text: str, tags: list, time: datetime = None):
"""Create annotation in Grafana for forecast events."""
# ... implementation (see EXAMPLES.md)
def create_capacity_alert_annotation(self, capacity_report: dict):
"""Create Grafana annotation for capacity warnings."""
# ... implementation (see EXAMPLES.md)
# Export to CSV for Grafana datasource
def export_forecast_to_csv(forecast: pd.DataFrame, output_path: str):
"""Export forecast in format compatible with Grafana CSV datasource."""
# ... implementation (see EXAMPLES.md)
# Example usage
grafana = GrafanaForecaster(
grafana_url="http://grafana:3000",
api_key="YOUR_API_KEY",
dashboard_uid="your-dashboard-uid",
)
grafana.create_capacity_alert_annotation(report)
export_forecast_to_csv(forecast, "grafana/forecasts/cpu_forecast.csv")
See EXAMPLES.md Step 5 for complete GrafanaForecaster.
→ Annotations in dashboards, capacity warnings visible as vertical markers, forecast accessible via CSV datasource.
If err: verify API key perms, check dashboard UID correct, ensure timestamps ms for annotations, test API w/ curl before integrating.
Step 6: Automate Generation
Scheduled jobs → forecasts regularly.
# forecasting/scheduler.py (abbreviated)
import schedule
import time
def generate_daily_forecast():
"""Generate forecast for all monitored metrics."""
logger.info("Starting daily forecast generation")
metrics_config = [
{"name": "cpu_usage", "query": "...", "capacity_limit": 0.8, "forecast_days": 30},
{"name": "memory_usage", "query": "...", "capacity_limit": 32, "forecast_days": 30},
{"name": "disk_usage", "query": "...", "capacity_limit": 500, "forecast_days": 90},
]
loader = MetricsLoader(prometheus_url="http://prometheus:9090")
for metric_config in metrics_config:
df = loader.load_from_prometheus(query=metric_config["query"], lookback_days=90)
forecaster = ProphetForecaster()
forecaster.fit(df)
forecast = forecaster.forecast(periods=metric_config["forecast_days"])
planner = CapacityPlanner(capacity_limit=metric_config["capacity_limit"])
report = planner.generate_capacity_report(forecast)
export_forecast_to_csv(forecast, f"grafana/forecasts/{metric_config['name']}_forecast.csv")
# ... (see EXAMPLES.md for complete implementation)
# Schedule daily at 2 AM
schedule.every().day.at("02:00").do(generate_daily_forecast)
while True:
schedule.run_pending()
time.sleep(60)
See EXAMPLES.md Step 6 for complete scheduler.
→ Forecasts daily all metrics, capacity reports logged, CSV exported, alerts sent critical warnings.
If err: verify scheduler runs continuously (systemd/supervisor), check Prometheus connectivity, ensure sufficient disk, retry logic for transient failures, monitor scheduler itself.
Check
- Historical data ≥90 days continuous
- Prophet captures daily/weekly seasonality in components
- Forecast CI contains 85-95% actual in validation
- Capacity exhaustion correct known scenarios
- ARIMA residuals white noise in diagnostic
- Grafana annotations at predicted warning/exhaustion
- Automated daily w/o manual intervention
- Forecast accuracy (MAPE) < 15% validation
Traps
- Insufficient data: Need 3-12mo reliable seasonality. Avoid <60 days.
- Ignore known events: Holidays, deployments, campaigns skew → add as external regressors or holidays.
- Overconfidence long-term: Accuracy degrades beyond 30-90 days. Directional guidance not exact.
- Static capacity: Infra changes. Update
capacity_limitwhen adding. - Forecast anomalies: Outliers propagate. Clean data or robust methods.
- Not updating models: Stale after system changes. Retrain weekly or after significant arch.
- Ignore CI: Point forecasts misleading. Always lower/upper bounds for planning.
- Wrong seasonality period: Daily for hourly, weekly for daily. Mismatch → poor forecasts.
→
detect-anomalies-aiops— Anomaly detection complements forecastingplan-capacity— Infra capacity planning workflowsbuild-grafana-dashboards— Visualize forecasts + capacity trends
Dépôt GitHub
Compétences associées
executing-plans
DesignUtilisez la compétence executing-plans lorsque vous disposez d'un plan de mise en œuvre complet à exécuter par lots contrôlés avec des points de contrôle de revue. Elle charge et examine le plan de manière critique, puis exécute les tâches par petits lots (3 tâches par défaut) tout en rapportant la progression entre chaque lot pour une revue par l'architecte. Cela garantit une mise en œuvre systématique avec des points de contrôle de qualité intégrés.
requesting-code-review
DesignCette compétence délègue un sous-agent réviseur de code pour analyser les modifications apportées au code par rapport aux exigences avant de poursuivre. Elle doit être utilisée après avoir terminé des tâches, implémenté des fonctionnalités majeures, ou avant une fusion vers la branche principale. La revue aide à détecter précocement les problèmes en comparant l'implémentation actuelle avec le plan initial.
connect-mcp-server
DesignCette compétence fournit un guide complet permettant aux développeurs de connecter des serveurs MCP à Claude Code via les transports HTTP, stdio ou SSE. Elle couvre l'installation, la configuration, l'authentification et la sécurité pour intégrer des services externes tels que GitHub, Notion et des API personnalisées. Utilisez-la lors de la configuration d'intégrations MCP, de la configuration d'outils externes ou du travail avec le Protocole de Contexte de Modèle de Claude.
web-cli-teleport
DesignCette compétence aide les développeurs à choisir entre les interfaces Web et CLI de Claude Code en fonction de l'analyse des tâches, puis permet une téléportation transparente des sessions entre ces environnements. Elle optimise le flux de travail en gérant l'état et le contexte de la session lors du passage entre le web, la CLI ou le mobile. Utilisez-la pour des projets complexes nécessitant différents outils à diverses étapes.
