返回技能列表

track-ml-experiments

pjt222
更新于 2 days ago
7 次查看
17
2
17
在 GitHub 上查看
aiautomationdesign

关于

This skill sets up an MLflow tracking server to manage machine learning experiments, enabling automated logging for popular frameworks and systematic comparison of runs. It helps developers migrate from manual logging, manage artifacts in remote storage, and build reproducible workflows with full lineage tracking. Use it when starting a new ML project or needing to systematically compare multiple training runs.

快速安装

Claude Code

推荐
主要方式
npx skills add pjt222/agent-almanac -a claude-code
插件命令备选方式
/plugin add https://github.com/pjt222/agent-almanac
Git 克隆备选方式
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/track-ml-experiments

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Track ML Experiments

See Extended Examples for complete configuration files and templates.

Set up MLflow tracking server + impl comprehensive experiment tracking w/ metrics, params, artifacts.

Use When

  • Start new ML proj needing experiment tracking
  • Migrate from manual logs → automated
  • Cmp multi training runs systematically
  • Share experiment results w/ team
  • Build reproducible ML workflows w/ full lineage
  • Integrate experiment tracking into CI/CD

In

  • Required: Python env w/ ML framework (sklearn, pytorch, tensorflow, xgboost)
  • Required: MLflow install (pip install mlflow)
  • Optional: Remote storage backend (S3, Azure Blob, GCS) for artifacts
  • Optional: DB backend (PostgreSQL, MySQL) for metadata
  • Optional: Auth creds for remote backends

Do

Step 1: Init MLflow Tracking Server

Setup w/ appropriate backend stores.

# Option 1: Local file-based tracking (development)
mkdir -p mlruns
export MLFLOW_TRACKING_URI="file:./mlruns"

# Option 2: SQLite backend with local artifacts
mlflow server \
  --backend-store-uri sqlite:///mlflow.db \
  --default-artifact-root ./mlartifacts \
# ... (see EXAMPLES.md for complete implementation)

Create config file for team sharing:

# mlflow_config.py
import os

MLFLOW_TRACKING_URI = os.getenv(
    "MLFLOW_TRACKING_URI",
    "http://mlflow-server.company.com:5000"
)

# ... (see EXAMPLES.md for complete implementation)

Got: MLflow UI accessible at host:port, empty experiments list. Server logs confirm startup w/o errors.

If err: Check port avail w/ netstat -tulpn | grep 5000, verify DB connection strings, ensure S3 creds configured (aws configure), check firewall for remote.

Step 2: Configure Autologging for ML Frameworks

Enable framework-specific autologging → capture metrics, params, models auto.

# training_script.py
import mlflow
from mlflow_config import MLFLOW_TRACKING_URI, MLFLOW_EXPERIMENT_NAME

# Set tracking URI
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment(MLFLOW_EXPERIMENT_NAME)

# ... (see EXAMPLES.md for complete implementation)

For PyTorch:

import mlflow.pytorch

mlflow.pytorch.autolog(
    log_every_n_epoch=1,
    log_every_n_step=None,
    log_models=True,
    disable=False,
    exclusive=False,
# ... (see EXAMPLES.md for complete implementation)

Got: Run appears in UI w/ all hyperparams, metrics (training/val loss, acc), model artifacts, input examples auto-logged.

If err: Verify MLflow ver compat w/ ML framework (mlflow.sklearn.autolog() needs MLflow ≥1.20), check autolog supported for model type, disable + use manual logging fallback, inspect logs w/ mlflow.set_tracking_uri() for connection errs.

Step 3: Comprehensive Manual Logging

Add custom metrics, params, artifacts, tags for complete documentation.

# comprehensive_tracking.py
import mlflow
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

def train_and_log_model(params, X_train, y_train, X_test, y_test):
    """
# ... (see EXAMPLES.md for complete implementation)

Got: UI displays rich info: step-by-step metrics, viz artifacts, model signature, input examples, comprehensive tags for filter/search.

If err: Check artifact storage perms (aws s3 ls s3://bucket/path), verify matplotlib backend for figure logging (plt.switch_backend('Agg')), ensure JSON-serializable for log_dict, check disk space for local.

Step 4: Compare Runs + Generate Reports

Use MLflow comparison tools to analyze multiple experiments.

# compare_runs.py
import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient()

def compare_experiments(experiment_name, metric_name="test_accuracy", top_n=5):
    """
# ... (see EXAMPLES.md for complete implementation)

CLI comparison:

# Compare runs using MLflow CLI
mlflow runs compare --experiment-name customer-churn \
  --order-by "metrics.test_accuracy DESC" \
  --max-results 10

# Export run data to CSV
mlflow experiments csv --experiment-name customer-churn \
  --output experiments.csv

Got: Console shows sorted runs w/ key metrics, HTML report w/ formatted comparison, CSV w/ all run data.

If err: Verify experiment exists w/ mlflow experiments list, check metric names match exact (case-sensitive), ensure runs completed (check status), verify file write perms for outputs.

Step 5: Configure Remote Artifact Storage

Setup S3/Azure/GCS backends for scalable artifact mgmt.

# artifact_storage_config.py
import mlflow
import os

def configure_s3_backend():
    """
    Configure S3 for artifact storage.
    """
# ... (see EXAMPLES.md for complete implementation)

Docker Compose for MLflow w/ PostgreSQL + S3:

# docker-compose.yml
version: '3.8'

services:
  postgres:
    image: postgres:14
    environment:
      POSTGRES_DB: mlflow
# ... (see EXAMPLES.md for complete implementation)

Got: Artifacts upload to remote, UI shows artifact links pointing to S3/Azure/GCS URIs, downloading from UI works.

If err: Verify cloud creds w/ aws s3 ls | az storage blob list, check bucket perms (write access), ensure MLflow w/ cloud extras (pip install mlflow[extras]), test net connectivity to storage, check CORS for browser access.

Step 6: Experiment Lifecycle Mgmt

Setup automated cleanup, archival, organization policies.

# lifecycle_management.py
import mlflow
from mlflow.tracking import MlflowClient
from datetime import datetime, timedelta

client = MlflowClient()

def archive_old_experiments(days_old=90):
# ... (see EXAMPLES.md for complete implementation)

Got: Old experiments → deleted state, failed runs removed from active, best runs tagged for filter, storage reclaimed.

If err: Check experiment perms (must be owner to delete), verify runs actually FAILED status, ensure metric exists for all ranked, check DB connectivity for bulk ops, verify perms for artifact deletion in remote.

Check

  • MLflow tracking server accessible via web UI
  • Experiments created + runs logged
  • Autologging captures framework-specific metrics auto
  • Custom metrics, params, artifacts logged correct
  • Comparison queries return expected top runs
  • Remote artifact storage configured + functional
  • Artifacts downloadable from UI + programmatic
  • Run filtering + searching works w/ tags
  • HTML comparison reports gen w/o errs
  • Lifecycle scripts execute

Traps

  • Connection timeouts: Server not accessible from training scripts → verify MLFLOW_TRACKING_URI env, check firewall, ensure server running
  • Artifact upload fails: S3/Azure creds not configured | bucket missing → test cloud CLI first, verify bucket perms
  • Missing metrics: Autologging disabled | unsupported framework ver → check compat, fallback to manual logging
  • Run clutter: Too many runs polluting UI → impl tagging strategy early, use lifecycle scripts regularly
  • Large artifacts: Logging entire datasets → storage bloat. Log samples | refs, use external data versioning (DVC)
  • Inconsistent naming: Params logged w/ diff names across runs → standardize naming in config
  • DB locks: SQLite no concurrent writes → use PostgreSQL/MySQL for multi-user
  • Autolog conflicts: Multiple autolog configs interfere → use exclusive=True | disable conflicting

  • register-ml-model — register tracked models in MLflow Model Registry
  • version-ml-data — version datasets via DVC for reproducible experiments
  • setup-automl-pipeline — integrate tracking into automated ML pipelines
  • deploy-ml-model-serving — deploy best-performing tracked models to prod
  • orchestrate-ml-pipeline — combine tracking w/ workflow orchestration

GitHub 仓库

pjt222/agent-almanac
路径: i18n/caveman-ultra/skills/track-ml-experiments
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

相关推荐技能

content-collections

Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。

查看技能

polymarket

这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。

查看技能

creating-opencode-plugins

该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。

查看技能

sglang

SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。

查看技能