スキル一覧に戻る

monitor-model-drift

pjt222
更新日 Yesterday
2 閲覧
17
2
17
GitHubで表示
テストaitestingautomationdesigndata

について

このスキルは、Evidently AIと統計的検定(PSI、KS)を用いて、データドリフトとコンセプトドリフトを検出する本番MLモデル監視を実装します。パフォーマンス劣化を早期に捕捉するため、自動化されたアラートおよびレポート作成ワークフローを構築します。モデルのパフォーマンスが説明不能に低下した場合、データ分布が変化した場合、または規制上の監視が必要な場合にご利用ください。

クイックインストール

Claude Code

推奨
メイン
npx skills add pjt222/agent-almanac -a claude-code
プラグインコマンド代替
/plugin add https://github.com/pjt222/agent-almanac
Git クローン代替
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/monitor-model-drift

このコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします

ドキュメント

Monitor Model Drift

See Extended Examples for complete configuration files + templates.

Detect + alert on data drift + concept drift in production ML models using statistical tests + automated monitoring.

When Use

  • Production ML models experiencing unexplained performance degradation
  • New data distributions differ from training data
  • Seasonal or temporal shifts in input features
  • Need proactive alerts before business metrics impacted
  • Regulatory requirements for model monitoring (e.g., SR 11-7, EU AI Act)
  • Multiple model versions deployed requiring drift comparison

Inputs

  • Required: Production model predictions + features (last 30-90 days)
  • Required: Reference dataset (training or validation data)
  • Required: Ground truth labels (may be delayed)
  • Optional: Feature importance scores or SHAP values
  • Optional: Business metric thresholds for alerting
  • Optional: Historical drift reports for trend analysis

Steps

Step 1: Install + Configure Evidently AI

Set up monitoring framework with appropriate dependencies.

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Evidently and dependencies
pip install evidently pandas scikit-learn prometheus-client

# Create monitoring directory structure
mkdir -p monitoring/{reports,config,alerts}

Create configuration file:

# monitoring/config/drift_config.py
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
from evidently.metrics import (
    DatasetDriftMetric,
    DatasetMissingValuesMetric,
    ColumnDriftMetric,
)

# ... (see EXAMPLES.md for complete implementation)

Got: Configuration file created with thresholds matching model's tolerance.

If fail: Start with conservative thresholds (PSI > 0.2, KS p-value < 0.01) + tune based on false positive rate.

Step 2: Implement Data Drift Detection

Create drift detection pipeline with multiple statistical tests.

# monitoring/drift_detector.py
import pandas as pd
import numpy as np
from scipy.stats import ks_2samp, chi2_contingency
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
from evidently.metrics import ColumnDriftMetric, DatasetDriftMetric
from datetime import datetime, timedelta
# ... (see EXAMPLES.md for complete implementation)

Got: Drift detection runs successfully, produces JSON report with per-feature statistics, identifies drifted features.

If fail: Check for missing values (impute or drop), ensure reference + current data have same columns, verify data types match between datasets.

Step 3: Generate Evidently Reports

Create visual HTML reports for human review + debugging.

# monitoring/generate_reports.py
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
from evidently.metrics import (
    ColumnDriftMetric,
    DatasetDriftMetric,
    DatasetMissingValuesMetric,
)
# ... (see EXAMPLES.md for complete implementation)

Got: HTML reports generated in monitoring/reports/, viewable in browser with interactive charts showing distribution comparisons.

If fail: Verify write permissions to output directory, check Evidently version is >= 0.4.0, ensure data frames have sufficient rows (>100 recommended).

Step 4: Implement Concept Drift Detection

Monitor prediction performance to detect concept drift (relationship between features + target changes).

# monitoring/concept_drift.py
import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score, mean_squared_error, accuracy_score
from typing import Dict, List
import json


# ... (see EXAMPLES.md for complete implementation)

Got: Performance monitoring detects when model accuracy/AUC drops below threshold, signaling potential concept drift.

If fail: Ensure ground truth labels are available (may require delayed validation batch job), verify prediction scores properly calibrated (0-1 range for classification), check for label leakage in features.

Step 5: Set Up Automated Alerting

Integrate drift detection with alerting systems (Slack, PagerDuty, email).

# monitoring/alerting.py
import requests
import json
from typing import Dict, List
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# ... (see EXAMPLES.md for complete implementation)

Got: Alerts sent to Slack/PagerDuty when drift detected, with severity based on drift share + critical feature involvement.

If fail: Test webhook URLs with curl first, verify PagerDuty integration key has correct permissions, check firewall rules for outbound HTTPS, implement retry logic for transient network failures.

Step 6: Schedule Monitoring Jobs

Automate drift detection to run on schedule (daily or weekly).

# monitoring/scheduler.py
import schedule
import time
import logging
from datetime import datetime, timedelta
import pandas as pd

logging.basicConfig(
# ... (see EXAMPLES.md for complete implementation)

Alternatively, use cron:

# Add to crontab (crontab -e)
# Run daily at 2 AM
0 2 * * * cd /path/to/monitoring && /path/to/venv/bin/python scheduler.py >> logs/cron.log 2>&1

Or use Airflow DAG:

# airflow/dags/drift_monitoring_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'ml-team',
    'depends_on_past': False,
# ... (see EXAMPLES.md for complete implementation)

Got: Monitoring runs automatically on schedule, generates reports, sends alerts only when drift exceeds thresholds, logs all activity.

If fail: Check scheduler process running (ps aux | grep scheduler), verify cron service active, ensure data sources accessible, review logs for exceptions, set up dead man's switch alert if job doesn't run.

Checks

  • PSI + KS test calculations produce expected values for known drift scenarios
  • Evidently HTML reports render correctly + show distribution overlays
  • Critical feature drift triggers alerts immediately
  • Concept drift detector identifies performance degradation within 3 days
  • Alerts delivered to all configured channels (Slack, email, PagerDuty)
  • Scheduled job runs without manual intervention for 7+ days
  • False positive rate < 5% (tune thresholds if higher)
  • Drift detection completes in < 5 minutes for 1M rows

Pitfalls

  • Stale reference data: Update reference dataset quarterly or after model retraining to reflect natural data evolution
  • Sample size mismatch: Ensure current + reference datasets have similar sizes (>1000 rows each) for reliable statistics
  • Missing ground truth: Concept drift requires labels; implement delayed labeling pipeline if real-time labels unavailable
  • Seasonality confusion: Weekly/monthly patterns may trigger false positives; use time-aligned reference windows or deseasonalize features
  • Alert fatigue: Start with high thresholds + gradually lower based on actual model retraining cadence
  • Ignoring data quality drift: Monitor missing values, outliers, encoding errors separately from distribution drift
  • Over-reliance on aggregate metrics: Per-feature analysis crucial; aggregate drift may mask critical individual feature shifts
  • Neglecting prediction distribution: Even without ground truth, sudden prediction distribution shifts signal issues

See Also

  • detect-anomalies-aiops — Time series anomaly detection for operational metrics
  • deploy-ml-model-serving — Model deployment patterns + versioning
  • setup-prometheus-monitoring — Infrastructure metrics collection
  • review-data-analysis — Statistical analysis validation + peer review

GitHub リポジトリ

pjt222/agent-almanac
パス: i18n/caveman/skills/monitor-model-drift
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

関連スキル

evaluating-llms-harness

テスト

このClaudeスキルは、lm-evaluation-harnessを実行し、MMLUやGSM8Kなど60以上の標準化学術タスクでLLMをベンチマークします。開発者がモデルの品質を比較し、トレーニングの進捗を追跡し、学術的な結果を報告するために設計されています。このツールはHuggingFaceやvLLMモデルを含む様々なバックエンドをサポートしています。

スキルを見る

cloudflare-cron-triggers

テスト

このスキルは、cron式を使用してWorkersをスケジュールするためのCloudflare Cron Triggersの実装に関する包括的な知識を提供します。定期的なタスクの設定、メンテナンスジョブ、自動化されたワークフローの構築を網羅し、無効なcron式やタイムゾーン問題といった一般的な課題への対処法も含みます。開発者はこれを使用して、スケジュールされたハンドラーの設定、cronトリガーのテスト、WorkflowsやGreen Computeとの連携を構成できます。

スキルを見る

webapp-testing

テスト

このClaude Skillは、Playwrightベースのツールキットを提供し、Pythonスクリプトを通じてローカルWebアプリケーションのテストを可能にします。フロントエンドの検証、UIデバッグ、スクリーンショット撮影、ログ表示を実現し、サーバーライフサイクルを管理します。ブラウザ自動化タスクにご利用いただけますが、コンテキストの汚染を避けるため、スクリプトのソースコードを読むのではなく直接実行してください。

スキルを見る

finishing-a-development-branch

テスト

このスキルは、開発者がテストの合格を確認し、構造化された統合オプションを提示することで、完成した作業を仕上げることを支援します。実装が完了した後のマージ、PR作成、ブランチの整理といったワークフローを案内します。コードが準備できてテスト済みの際に使用し、開発プロセスを体系的に完了させましょう。

スキルを見る