MCP HubMCP Hub
Volver a habilidades

register-ml-model

pjt222
Actualizado 2 days ago
5 vistas
17
2
17
Ver en GitHub
Otroaiautomationdata

Acerca de

Esta habilidad registra modelos entrenados en el Registro de Modelos de MLflow, proporcionando control de versiones y transiciones de etapas gestionadas (como de Preparación a Producción) con flujos de trabajo de aprobación. Se utiliza para promover modelos desde experimentación a producción, gestionar múltiples versiones en distintas etapas y manejar reversiones o el cumplimiento de auditorías. Los desarrolladores deben usarla para una gobernanza sistemática del despliegue y el seguimiento del linaje de modelos dentro de las canalizaciones de MLOps.

Instalación rápida

Claude Code

Recomendado
Principal
npx skills add pjt222/agent-almanac -a claude-code
Comando PluginAlternativo
/plugin add https://github.com/pjt222/agent-almanac
Git CloneAlternativo
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/register-ml-model

Copia y pega este comando en Claude Code para instalar esta habilidad

Documentación

Register ML Model

See Extended Examples for complete configuration files and templates.

Impl MLflow Model Registry → systematic model versioning, stage mgmt, deployment governance.

Use When

  • Promote trained model exp → prod
  • Manage multi vers across dev stages
  • Impl approval workflows → governance
  • Track lineage train → deploy
  • Rollback to prev vers
  • Compare deployed vers → A/B test
  • Audit changes → compliance

In

  • Required: MLflow tracking server w/ Model Registry enabled
  • Required: Trained model logged w/ MLflow (from tracking runs)
  • Required: Model name → registry registration
  • Optional: Approval workflow (email, Slack, Jira)
  • Optional: CI/CD pipeline → auto promotion
  • Optional: Validation metric thresholds

Do

Step 1: Configure Backend

Set up MLflow Model Registry w/ DB backend (file-based not rec for prod).

# Start MLflow server with Model Registry support
mlflow server \
  --backend-store-uri postgresql://user:pass@localhost:5432/mlflow \
  --default-artifact-root s3://mlflow-artifacts/models \
  --host 0.0.0.0 \
  --port 5000

Python config:

# model_registry_config.py
import mlflow
from mlflow.tracking import MlflowClient

# Set tracking URI (must support Model Registry)
MLFLOW_TRACKING_URI = "http://mlflow-server.company.com:5000"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

# ... (see EXAMPLES.md for complete implementation)

→ Model Registry UI tab in MLflow, search_registered_models() returns success (even empty), DB has registered_models table.

If err: verify MLflow ≥ 1.2 (Model Registry from 1.2), check DB backend (SQLite not fully supported), --backend-store-uri → DB not file://, DB user has CREATE TABLE perms, server logs for migration errs.

Step 2: Register from Run

Register logged model → Model Registry w/ comprehensive metadata.

# register_model.py
import mlflow
from mlflow.tracking import MlflowClient
from model_registry_config import MLFLOW_TRACKING_URI

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
client = MlflowClient()

# ... (see EXAMPLES.md for complete implementation)

→ New ver in Registry UI, ver has desc + tags, artifacts accessible via models:/<model-name>/<version>, signature + input ex preserved.

If err: verify run_id exists + completed (client.get_run(run_id)), check artifact path matches logged (mlflow.search_runs()), model logged w/ proper framework flavor (mlflow.sklearn.log_model not mlflow.log_artifact), no special chars in name (hyphens not underscores), check artifact storage access.

Step 3: Stage Transitions w/ Validation

Move vers through stages (None → Staging → Production → Archived) w/ validation.

# stage_management.py
import mlflow
from mlflow.tracking import MlflowClient
from datetime import datetime

client = MlflowClient()

class ModelStageManager:
# ... (see EXAMPLES.md for complete implementation)

→ Ver stage updates in registry, old vers archived auto, transition timestamps in tags, rollback restores prev prod ver.

If err: check ver exists + in expected stage, verify archive_existing_versions flag (may not archive if only one ver), DB supports concurrent transactions for stage updates, check stage transition locks (one per ver at a time), verify approval workflow.

Step 4: Aliasing + Refs

Use model aliases for stable deployment refs (MLflow ≥ 2.0).

# model_aliases.py
from mlflow.tracking import MlflowClient

client = MlflowClient()

def set_model_alias(model_name, version, alias):
    """
    Set an alias for a model version (MLflow 2.0+).
# ... (see EXAMPLES.md for complete implementation)

→ Aliases in Registry UI, loading by alias works (models:/name@alias), updating alias immediately affects new loads, A/B test infra functional.

If err: upgrade MLflow ≥ 2.0 for native alias support, use tag-based fallback older vers, verify alias naming (alphanumeric + hyphens), check alias conflicts (one per ver).

Step 5: Lineage Tracking

Track full lineage data → deploy w/ comprehensive metadata.

# model_lineage.py
import mlflow
from mlflow.tracking import MlflowClient
import json

client = MlflowClient()

def enrich_model_metadata(model_name, version, lineage_data):
# ... (see EXAMPLES.md for complete implementation)

→ Ver tags w/ comprehensive lineage, get_model_lineage() returns full history, JSON report has data source, training, deploy info.

If err: verify tag values are strings (convert dicts → JSON), check tag key naming (no spaces/special), lineage captured during train, run_id valid + accessible.

Step 6: Automate w/ CI/CD

Integrate registration → CI/CD → auto promotion.

# .github/workflows/model_promotion.yml
name: Model Promotion Pipeline

on:
  workflow_dispatch:
    inputs:
      model_name:
        description: 'Model name to promote'
# ... (see EXAMPLES.md for complete implementation)

Python automation:

# scripts/promote_model.py
import argparse
from stage_management import ModelStageManager

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-name", required=True)
    parser.add_argument("--version", type=int, required=True)
# ... (see EXAMPLES.md for complete implementation)

→ Actions workflow triggers on manual dispatch, validation passes, model promoted to target stage, Slack notif sent, deploy pipeline triggered auto.

If err: check GH secrets for MLFLOW_TRACKING_URI, verify net access GH Actions → MLflow (may need VPN/IP allowlist), validation script has correct thresholds, Slack webhook config, Python script exec perms.

Check

  • Model Registry accessible + backend configured
  • Models register from training runs
  • Stage transitions work (None → Staging → Production → Archived)
  • Validation enforces quality thresholds
  • Aliases set + resolved
  • Lineage captured comprehensively
  • Rollback restores prev vers
  • CI/CD automates promotions
  • Team notifs work for stage changes
  • Model URIs resolve all stages

Traps

  • SQLite limits: Registry needs DB backend (Postgres/MySQL) for prod → file-based = concurrency issues
  • Stage conflicts: Multi vers same stage = confusion → use archive_existing_versions=True auto-archive
  • Missing run linkage: Register w/o run_id loses lineage → always from runs, not raw files
  • Alias confusion: Using stages as deploy targets vs aliases → stages = workflow, aliases = deploy refs
  • Validation skipped: Promote to Prod w/o checks → mandatory validation in CI/CD
  • No rollback plan: Prod issues w/o rollback → maintain prev Prod ver in Archived stage
  • Tag overload: Too many unstructured → standardize schema + naming
  • Manual processes: Human-driven = error-prone + slow → automate w/ CI/CD + approvals
  • Lost artifacts: Model registered but artifacts deleted → align retention w/ lifecycle

  • track-ml-experiments — log models to MLflow before register
  • deploy-ml-model-serving — deploy registered models → serving infra
  • run-ab-test-models — A/B test using registry aliases
  • orchestrate-ml-pipeline — automate train + register
  • version-ml-data — version training data for lineage

Repositorio GitHub

pjt222/agent-almanac
Ruta: i18n/caveman-ultra/skills/register-ml-model
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

Habilidades relacionadas

llamaguard

Otro

LlamaGuard es el modelo de Meta de 7-8B parámetros para moderar las entradas y salidas de LLM en seis categorías de seguridad como violencia y discurso de odio. Ofrece una precisión del 94-95% y puede implementarse usando vLLM, Hugging Face o Amazon SageMaker. Utiliza esta skill para integrar fácilmente filtrado de contenido y barreras de seguridad en tus aplicaciones de IA.

Ver habilidad

cost-optimization

Otro

Esta Skill de Claude ayuda a los desarrolladores a optimizar los costes en la nube mediante el ajuste de tamaño de recursos, estrategias de etiquetado y análisis de gastos. Proporciona un marco para reducir los gastos en la nube e implementar una gobernanza de costes en AWS, Azure y GCP. Úsala cuando necesites analizar los costes de infraestructura, ajustar el tamaño de los recursos o cumplir con restricciones presupuestarias.

Ver habilidad

quantizing-models-bitsandbytes

Otro

Esta habilidad cuantiza LLMs a precisión de 8 o 4 bits utilizando bitsandbytes, logrando una reducción de memoria del 50-75% con pérdida mínima de precisión. Es ideal para ejecutar modelos más grandes en memoria GPU limitada o para acelerar la inferencia, admitiendo formatos como INT8, NF4 y FP4. La habilidad se integra con HuggingFace Transformers y permite entrenamiento QLoRA y optimizadores de 8 bits.

Ver habilidad

dispatching-parallel-agents

Otro

Esta Skill de Claude despliega múltiples agentes para investigar y solucionar 3 o más problemas independientes de forma concurrente. Está diseñada para escenarios que involucran fallos no relacionados que pueden resolverse sin estado compartido o dependencias. Su capacidad principal es la resolución paralela de problemas, asignando un agente por cada dominio problemático independiente para maximizar la eficiencia.

Ver habilidad