register-ml-model
关于
This skill registers trained models in MLflow's Model Registry with full version control and stage management (Staging, Production, Archived). It implements approval workflows for governance and tracks comprehensive metadata and deployment lineage for auditing. Use it to promote models from experimentation to production, manage multiple versions, and maintain compliance through rollback capabilities.
快速安装
Claude Code
推荐npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/register-ml-model在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Register ML Model
See Extended Examples for complete configuration files and templates.
Implement MLflow Model Registry for systematic model versioning, stage management, and deployment governance.
适用场景
- Promoting a trained model from experimentation to production
- Managing multiple model versions across development stages
- Implementing model approval workflows for governance
- Tracking model lineage from training to deployment
- Rolling back to previous model versions
- Comparing deployed model versions for A/B testing
- Auditing model changes for compliance requirements
输入
- 必需: MLflow tracking server with Model Registry enabled
- 必需: Trained model logged with MLflow (from tracking runs)
- 必需: Model name for registry registration
- 可选: Approval workflow integration (email, Slack, Jira)
- 可选: CI/CD pipeline for automated promotion
- 可选: Model validation metrics thresholds
步骤
第 1 步:Configure Model Registry Backend
Set up MLflow Model Registry with database backend (file-based registry not recommended for production).
# Start MLflow server with Model Registry support
mlflow server \
--backend-store-uri postgresql://user:pass@localhost:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts/models \
--host 0.0.0.0 \
--port 5000
Python configuration:
# model_registry_config.py
import mlflow
from mlflow.tracking import MlflowClient
# Set tracking URI (must support Model Registry)
MLFLOW_TRACKING_URI = "http://mlflow-server.company.com:5000"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
# ... (see EXAMPLES.md for complete implementation)
预期结果: Model Registry UI tab appears in MLflow, search_registered_models() returns successfully (even if empty), database contains registered_models table.
失败处理: Verify MLflow version ≥1.2 (Model Registry introduced in 1.2), check database backend (SQLite not fully supported for Model Registry), ensure --backend-store-uri points to database (not file://), verify database user has CREATE TABLE permissions, check MLflow server logs for migration errors.
第 2 步:Register Model from Training Run
Register a logged model to the Model Registry with comprehensive metadata.
# register_model.py
import mlflow
from mlflow.tracking import MlflowClient
from model_registry_config import MLFLOW_TRACKING_URI
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
client = MlflowClient()
# ... (see EXAMPLES.md for complete implementation)
预期结果: New model version appears in Model Registry UI, version includes description and tags, model artifacts are accessible via models:/<model-name>/<version> URI, model signature and input example are preserved.
失败处理: Verify run_id exists and has completed (client.get_run(run_id)), check model artifact path matches logged artifact (mlflow.search_runs() to inspect), ensure model was logged with proper framework flavor (mlflow.sklearn.log_model not mlflow.log_artifact), verify no special characters in model name (use hyphens not underscores), check artifact storage accessibility.
第 3 步:Implement Stage Transitions with Validation
Move model versions through stages (None → Staging → Production → Archived) with validation checks.
# stage_management.py
import mlflow
from mlflow.tracking import MlflowClient
from datetime import datetime
client = MlflowClient()
class ModelStageManager:
# ... (see EXAMPLES.md for complete implementation)
预期结果: Model version stage updates in registry, old versions archived automatically, transition timestamps recorded in tags, rollback restores previous production version.
失败处理: Check version exists and is in expected stage, verify archive_existing_versions flag behavior (may not archive if only one version), ensure database supports concurrent transactions for stage updates, check for stage transition locks (only one transition per version at a time), verify approval workflow integration.
第 4 步:Implement Model Aliasing and References
Use model aliases for stable deployment references (MLflow ≥2.0).
# model_aliases.py
from mlflow.tracking import MlflowClient
client = MlflowClient()
def set_model_alias(model_name, version, alias):
"""
Set an alias for a model version (MLflow 2.0+).
# ... (see EXAMPLES.md for complete implementation)
预期结果: Aliases appear in Model Registry UI, loading models by alias works (models:/name@alias), updating alias immediately affects new loads, A/B test infrastructure functional.
失败处理: Upgrade MLflow to ≥2.0 for native alias support, use tag-based fallback for older versions, verify alias naming (alphanumeric and hyphens only), check for alias conflicts (one alias per model version).
第 5 步:Implement Model Lineage Tracking
Track full lineage from data to deployment with comprehensive metadata.
# model_lineage.py
import mlflow
from mlflow.tracking import MlflowClient
import json
client = MlflowClient()
def enrich_model_metadata(model_name, version, lineage_data):
# ... (see EXAMPLES.md for complete implementation)
预期结果: Model version tags include comprehensive lineage information, get_model_lineage() returns full history, JSON report contains data source, training details, and deployment info.
失败处理: Verify tag values are strings (convert dicts to JSON), check tag key naming (no spaces or special chars), ensure lineage data captured during training, verify run_id is valid and accessible.
第 6 步:Automate Registry Operations with CI/CD
Integrate model registration into CI/CD pipelines for automated promotion.
# .github/workflows/model_promotion.yml
name: Model Promotion Pipeline
on:
workflow_dispatch:
inputs:
model_name:
description: 'Model name to promote'
# ... (see EXAMPLES.md for complete implementation)
Python automation script:
# scripts/promote_model.py
import argparse
from stage_management import ModelStageManager
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--model-name", required=True)
parser.add_argument("--version", type=int, required=True)
# ... (see EXAMPLES.md for complete implementation)
预期结果: GitHub Actions workflow triggers on manual dispatch, validation tests pass, model promoted to target stage, Slack notification sent, deployment pipeline triggered automatically.
失败处理: Check GitHub secrets configuration for MLFLOW_TRACKING_URI, verify network access from GitHub Actions to MLflow server (may need VPN or IP allowlist), ensure validation script has correct metric thresholds, check Slack webhook configuration, verify Python script executable permissions.
验证清单
- Model Registry accessible and backend configured
- Models register successfully from training runs
- Stage transitions work (None → Staging → Production → Archived)
- Validation checks enforce quality thresholds
- Model aliases set and resolved correctly
- Lineage metadata captured comprehensively
- Rollback functionality restores previous versions
- CI/CD pipeline automates promotions
- Team notifications working for stage changes
- Model URIs resolve correctly in all stages
常见问题
- SQLite limitations: Model Registry requires database backend (PostgreSQL/MySQL) for production - file-based registry causes concurrency issues
- Stage conflicts: Multiple versions in same stage cause confusion - use
archive_existing_versions=Trueto auto-archive - Missing run linkage: Registering models without run_id loses lineage - always register from MLflow runs, not raw files
- Alias confusion: Using stages as deployment targets instead of aliases - stages are for workflow, aliases for deployment references
- Validation skipped: Promoting to Production without checks - implement mandatory validation in CI/CD pipeline
- No rollback plan: Production issues without rollback capability - maintain previous Production version in Archived stage
- Tag overload: Too many unstructured tags - standardize tag schema and naming conventions
- Manual processes: Human-driven promotions are error-prone and slow - automate with CI/CD and approval workflows
- Lost artifacts: Model registered but artifacts deleted from storage - ensure artifact retention policies align with model lifecycle
相关技能
track-ml-experiments- Log models to MLflow before registering themdeploy-ml-model-serving- Deploy registered models to serving infrastructurerun-ab-test-models- A/B test models using registry aliasesorchestrate-ml-pipeline- Automate model training and registrationversion-ml-data- Version training data for model lineage
GitHub 仓库
相关推荐技能
qmd
开发这是一个本地搜索和索引的CLI工具,支持BM25、向量搜索和重排序功能。开发者可以用它快速索引本地文件(如Markdown文档)并进行混合搜索,特别适合代码库或文档的本地检索。它还提供MCP模式,能轻松集成到Claude开发环境中使用。
subagent-driven-development
开发该Skill用于在当前会话中执行包含独立任务的实施计划,它会为每个任务分派一个全新的子代理并在任务间进行代码审查。这种"全新子代理+任务间审查"的模式既能保障代码质量,又能实现快速迭代。适合需要在当前会话中连续执行独立任务,并希望在每个任务后都有质量把关的开发场景。
mcporter
开发mcporter Skill 让开发者能在Claude中直接管理和调用MCP服务器。它支持列出可用服务器、调用工具、处理OAuth认证以及管理服务器守护进程。开发者可以通过命令行式交互快速执行`mcporter list`查看服务器,或使用`mcporter call`直接调用工具,简化了MCP工作流程。
adk-deployment-specialist
开发这是一个用于部署和编排Google Vertex AI ADK智能体的Claude Skill,专为构建生产级多智能体系统而设计。它支持通过A2A协议进行智能体通信,提供代码执行沙箱和记忆库功能,并能处理智能体发现与任务提交。当开发者需要部署ADK智能体或编排多智能体协作时,可使用此Skill来简化Vertex AI Agent Engine的部署流程。
