SKILL·980EBD

monitor-model-drift

Name: monitor-model-drift
Author: pjt222

pjt222

Обновлено 1 month ago

9 просмотров

Тестированиеaitestingautomationdesigndata

О программе

Этот навык обнаруживает дрейф данных и концепций в промышленных ML-моделях с использованием Evidently AI и статистических тестов, таких как PSI и KS. Он настраивает автоматический мониторинг, оповещения и отчетность для раннего выявления деградации производительности. Используйте его, когда модели неожиданно ухудшаются, распределения данных меняются или требуется соблюдение нормативных требований.

Быстрая установка

Claude Code

Рекомендуется

Основной

npx skills add pjt222/agent-almanac -a claude-code

Команда плагинаАльтернативный

/plugin add https://github.com/pjt222/agent-almanac

Git клонированиеАльтернативный

git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/monitor-model-drift

Скопируйте и вставьте эту команду в Claude Code для установки этого навыка

Документация

Monitor Model Drift

See Extended Examples for complete configuration files and templates.

Detect + alert on data drift + concept drift in prod ML models via statistical tests + automated monitoring.

Use When

Prod ML models w/ unexplained perf degradation
New data distributions differ from training
Seasonal/temporal shifts in input features
Need proactive alerts before business metrics impacted
Regulatory: SR 11-7, EU AI Act
Multi model versions deployed → drift comparison

In

Required: Prod predictions + features (last 30-90 days)
Required: Reference dataset (training or validation)
Required: Ground truth labels (may be delayed)
Optional: Feature importance / SHAP values
Optional: Business metric thresholds for alerting
Optional: Historical drift reports for trend

Do

Step 1: Install + Config Evidently AI

Set up monitoring framework + deps.

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install Evidently and dependencies
pip install evidently pandas scikit-learn prometheus-client

# Create monitoring directory structure
mkdir -p monitoring/{reports,config,alerts}

Config file:

# monitoring/config/drift_config.py
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
from evidently.metrics import (
    DatasetDriftMetric,
    DatasetMissingValuesMetric,
    ColumnDriftMetric,
)

# ... (see EXAMPLES.md for complete implementation)

→ Config created w/ thresholds matching model tolerance.

If err: start conservative (PSI > 0.2, KS p-value < 0.01) + tune by false positive rate.

Step 2: Data Drift Detection

Drift detection pipeline w/ multiple statistical tests.

# monitoring/drift_detector.py
import pandas as pd
import numpy as np
from scipy.stats import ks_2samp, chi2_contingency
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
from evidently.metrics import ColumnDriftMetric, DatasetDriftMetric
from datetime import datetime, timedelta
# ... (see EXAMPLES.md for complete implementation)

→ Drift detection runs, JSON report w/ per-feature stats, drifted features identified.

If err: check missing values (impute/drop), reference + current data same cols, data types match.

Step 3: Generate Evidently Reports

Visual HTML reports for human review + debugging.

# monitoring/generate_reports.py
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset, TargetDriftPreset
from evidently.metrics import (
    ColumnDriftMetric,
    DatasetDriftMetric,
    DatasetMissingValuesMetric,
)
# ... (see EXAMPLES.md for complete implementation)

→ HTML reports in monitoring/reports/, browser-viewable w/ interactive charts showing distribution comparisons.

If err: write perms to output dir, Evidently version ≥ 0.4.0, data frames have ≥100 rows recommended.

Step 4: Concept Drift Detection

Monitor pred perf → detect concept drift (relationship features-target changes).

# monitoring/concept_drift.py
import pandas as pd
import numpy as np
from sklearn.metrics import roc_auc_score, mean_squared_error, accuracy_score
from typing import Dict, List
import json


# ... (see EXAMPLES.md for complete implementation)

→ Perf monitoring detects when accuracy/AUC drops below threshold → potential concept drift.

If err: ground truth labels available (may need delayed validation batch), prediction scores calibrated (0-1 range classification), no label leakage in features.

Step 5: Automated Alerting

Integrate w/ Slack, PagerDuty, email.

# monitoring/alerting.py
import requests
import json
from typing import Dict, List
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# ... (see EXAMPLES.md for complete implementation)

→ Alerts sent on drift, severity by drift share + critical feature involvement.

If err: test webhook URLs w/ curl, PagerDuty integration key has perms, firewall outbound HTTPS, retry logic for transient failures.

Step 6: Schedule Monitoring Jobs

Automate drift detection on schedule (daily/weekly).

# monitoring/scheduler.py
import schedule
import time
import logging
from datetime import datetime, timedelta
import pandas as pd

logging.basicConfig(
# ... (see EXAMPLES.md for complete implementation)

Cron alternative:

# Add to crontab (crontab -e)
# Run daily at 2 AM
0 2 * * * cd /path/to/monitoring && /path/to/venv/bin/python scheduler.py >> logs/cron.log 2>&1

Or Airflow DAG:

# airflow/dags/drift_monitoring_dag.py
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta

default_args = {
    'owner': 'ml-team',
    'depends_on_past': False,
# ... (see EXAMPLES.md for complete implementation)

→ Monitoring runs auto on schedule, reports generated, alerts only when drift exceeds thresholds, all activity logged.

If err: scheduler process running (ps aux | grep scheduler), cron service active, data sources accessible, review logs for exceptions, dead man's switch alert if job doesn't run.

Check

PSI + KS test calculations match expected values for known drift scenarios
Evidently HTML reports render correctly + show distribution overlays
Critical feature drift → immediate alerts
Concept drift detector identifies perf degradation within 3 days
Alerts delivered all configured channels (Slack, email, PagerDuty)
Scheduled job runs w/o manual intervention 7+ days
False positive rate < 5% (tune thresholds if higher)
Drift detection completes < 5min for 1M rows

Traps

Stale reference data: Update quarterly or after retraining to reflect natural data evolution
Sample size mismatch: Current + reference datasets similar sizes (>1000 rows each) for reliable stats
Missing ground truth: Concept drift needs labels; implement delayed labeling if real-time unavailable
Seasonality confusion: Weekly/monthly patterns → false positives; time-aligned reference windows or deseasonalize features
Alert fatigue: Start high thresholds, lower based on actual retraining cadence
Ignore data quality drift: Monitor missing values, outliers, encoding errors separately from distribution drift
Over-reliance on aggregate: Per-feature analysis crucial; aggregate drift may mask individual feature shifts
Neglect prediction distribution: Even w/o ground truth, sudden prediction shifts signal issues

→

detect-anomalies-aiops — time series anomaly detection for operational metrics
deploy-ml-model-serving — model deployment patterns + versioning
setup-prometheus-monitoring — infrastructure metrics collection
review-data-analysis — statistical analysis validation + peer review

GitHub репозиторий

pjt222/agent-almanac

Путь: i18n/caveman-ultra/skills/monitor-model-drift

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the monitor-model-drift skill?

monitor-model-drift is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform monitor-model-drift-related tasks without extra prompting.

How do I install monitor-model-drift?

Use the install commands on this page: add monitor-model-drift to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does monitor-model-drift belong to?

monitor-model-drift is in the Testing category, tagged ai, testing, automation, design and data.

Is monitor-model-drift free to use?

Yes. monitor-model-drift is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Похожие навыки

evaluating-llms-harness

Тестирование

Этот навык Claude запускает lm-evaluation-harness для тестирования LLM на более чем 60 стандартизированных академических задачах, таких как MMLU и GSM8K. Он предназначен для разработчиков, чтобы сравнивать качество моделей, отслеживать прогресс обучения или сообщать академические результаты. Инструмент поддерживает различные бэкенды, включая модели HuggingFace и vLLM.

Просмотреть навык

cloudflare-cron-triggers

Тестирование

Этот навык предоставляет обширные знания по реализации Cloudflare Cron Triggers для планирования запуска Workers с помощью cron-выражений. Он охватывает настройку периодических задач, заданий технического обслуживания и автоматизированных рабочих процессов, а также решение распространенных проблем, таких как неверные cron-выражения и ошибки часовых поясов. Разработчики могут использовать его для настройки планировщиков обработчиков, тестирования cron-триггеров и интеграции с Workflows и Green Compute.

Просмотреть навык

webapp-testing

Тестирование

Этот навык Claude предоставляет инструментарий на базе Playwright для тестирования локальных веб-приложений с помощью Python-скриптов. Он позволяет проводить проверку фронтенда, отладку интерфейса, создание скриншотов и просмотр логов, одновременно управляя жизненным циклом сервера. Используйте его для задач автоматизации браузера, но запускайте скрипты напрямую, вместо чтения их исходного кода, чтобы избежать загрязнения контекста.

Просмотреть навык

finishing-a-development-branch

Тестирование

Этот навык помогает разработчикам завершать готовую работу, проверяя прохождение тестов и предлагая структурированные варианты интеграции. Он направляет рабочий процесс по слиянию, созданию пул-реквестов или очистке веток после завершения реализации. Используйте его, когда ваш код готов и протестирован, чтобы систематически завершать процесс разработки.

Просмотреть навык