register-ml-model
Acerca de
Esta habilidad registra modelos entrenados en el Registro de Modelos de MLflow, proporcionando control de versiones y transiciones de etapas gestionadas (como de Preparación a Producción) con flujos de trabajo de aprobación. Se utiliza para promover modelos desde experimentación a producción, gestionar múltiples versiones en distintas etapas y manejar reversiones o el cumplimiento de auditorías. Los desarrolladores deben usarla para una gobernanza sistemática del despliegue y el seguimiento del linaje de modelos dentro de las canalizaciones de MLOps.
Instalación rápida
Claude Code
Recomendadonpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/register-ml-modelCopia y pega este comando en Claude Code para instalar esta habilidad
Documentación
Register ML Model
See Extended Examples for complete configuration files and templates.
Impl MLflow Model Registry → systematic model versioning, stage mgmt, deployment governance.
Use When
- Promote trained model exp → prod
- Manage multi vers across dev stages
- Impl approval workflows → governance
- Track lineage train → deploy
- Rollback to prev vers
- Compare deployed vers → A/B test
- Audit changes → compliance
In
- Required: MLflow tracking server w/ Model Registry enabled
- Required: Trained model logged w/ MLflow (from tracking runs)
- Required: Model name → registry registration
- Optional: Approval workflow (email, Slack, Jira)
- Optional: CI/CD pipeline → auto promotion
- Optional: Validation metric thresholds
Do
Step 1: Configure Backend
Set up MLflow Model Registry w/ DB backend (file-based not rec for prod).
# Start MLflow server with Model Registry support
mlflow server \
--backend-store-uri postgresql://user:pass@localhost:5432/mlflow \
--default-artifact-root s3://mlflow-artifacts/models \
--host 0.0.0.0 \
--port 5000
Python config:
# model_registry_config.py
import mlflow
from mlflow.tracking import MlflowClient
# Set tracking URI (must support Model Registry)
MLFLOW_TRACKING_URI = "http://mlflow-server.company.com:5000"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
# ... (see EXAMPLES.md for complete implementation)
→ Model Registry UI tab in MLflow, search_registered_models() returns success (even empty), DB has registered_models table.
If err: verify MLflow ≥ 1.2 (Model Registry from 1.2), check DB backend (SQLite not fully supported), --backend-store-uri → DB not file://, DB user has CREATE TABLE perms, server logs for migration errs.
Step 2: Register from Run
Register logged model → Model Registry w/ comprehensive metadata.
# register_model.py
import mlflow
from mlflow.tracking import MlflowClient
from model_registry_config import MLFLOW_TRACKING_URI
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
client = MlflowClient()
# ... (see EXAMPLES.md for complete implementation)
→ New ver in Registry UI, ver has desc + tags, artifacts accessible via models:/<model-name>/<version>, signature + input ex preserved.
If err: verify run_id exists + completed (client.get_run(run_id)), check artifact path matches logged (mlflow.search_runs()), model logged w/ proper framework flavor (mlflow.sklearn.log_model not mlflow.log_artifact), no special chars in name (hyphens not underscores), check artifact storage access.
Step 3: Stage Transitions w/ Validation
Move vers through stages (None → Staging → Production → Archived) w/ validation.
# stage_management.py
import mlflow
from mlflow.tracking import MlflowClient
from datetime import datetime
client = MlflowClient()
class ModelStageManager:
# ... (see EXAMPLES.md for complete implementation)
→ Ver stage updates in registry, old vers archived auto, transition timestamps in tags, rollback restores prev prod ver.
If err: check ver exists + in expected stage, verify archive_existing_versions flag (may not archive if only one ver), DB supports concurrent transactions for stage updates, check stage transition locks (one per ver at a time), verify approval workflow.
Step 4: Aliasing + Refs
Use model aliases for stable deployment refs (MLflow ≥ 2.0).
# model_aliases.py
from mlflow.tracking import MlflowClient
client = MlflowClient()
def set_model_alias(model_name, version, alias):
"""
Set an alias for a model version (MLflow 2.0+).
# ... (see EXAMPLES.md for complete implementation)
→ Aliases in Registry UI, loading by alias works (models:/name@alias), updating alias immediately affects new loads, A/B test infra functional.
If err: upgrade MLflow ≥ 2.0 for native alias support, use tag-based fallback older vers, verify alias naming (alphanumeric + hyphens), check alias conflicts (one per ver).
Step 5: Lineage Tracking
Track full lineage data → deploy w/ comprehensive metadata.
# model_lineage.py
import mlflow
from mlflow.tracking import MlflowClient
import json
client = MlflowClient()
def enrich_model_metadata(model_name, version, lineage_data):
# ... (see EXAMPLES.md for complete implementation)
→ Ver tags w/ comprehensive lineage, get_model_lineage() returns full history, JSON report has data source, training, deploy info.
If err: verify tag values are strings (convert dicts → JSON), check tag key naming (no spaces/special), lineage captured during train, run_id valid + accessible.
Step 6: Automate w/ CI/CD
Integrate registration → CI/CD → auto promotion.
# .github/workflows/model_promotion.yml
name: Model Promotion Pipeline
on:
workflow_dispatch:
inputs:
model_name:
description: 'Model name to promote'
# ... (see EXAMPLES.md for complete implementation)
Python automation:
# scripts/promote_model.py
import argparse
from stage_management import ModelStageManager
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--model-name", required=True)
parser.add_argument("--version", type=int, required=True)
# ... (see EXAMPLES.md for complete implementation)
→ Actions workflow triggers on manual dispatch, validation passes, model promoted to target stage, Slack notif sent, deploy pipeline triggered auto.
If err: check GH secrets for MLFLOW_TRACKING_URI, verify net access GH Actions → MLflow (may need VPN/IP allowlist), validation script has correct thresholds, Slack webhook config, Python script exec perms.
Check
- Model Registry accessible + backend configured
- Models register from training runs
- Stage transitions work (None → Staging → Production → Archived)
- Validation enforces quality thresholds
- Aliases set + resolved
- Lineage captured comprehensively
- Rollback restores prev vers
- CI/CD automates promotions
- Team notifs work for stage changes
- Model URIs resolve all stages
Traps
- SQLite limits: Registry needs DB backend (Postgres/MySQL) for prod → file-based = concurrency issues
- Stage conflicts: Multi vers same stage = confusion → use
archive_existing_versions=Trueauto-archive - Missing run linkage: Register w/o run_id loses lineage → always from runs, not raw files
- Alias confusion: Using stages as deploy targets vs aliases → stages = workflow, aliases = deploy refs
- Validation skipped: Promote to Prod w/o checks → mandatory validation in CI/CD
- No rollback plan: Prod issues w/o rollback → maintain prev Prod ver in Archived stage
- Tag overload: Too many unstructured → standardize schema + naming
- Manual processes: Human-driven = error-prone + slow → automate w/ CI/CD + approvals
- Lost artifacts: Model registered but artifacts deleted → align retention w/ lifecycle
→
track-ml-experiments— log models to MLflow before registerdeploy-ml-model-serving— deploy registered models → serving infrarun-ab-test-models— A/B test using registry aliasesorchestrate-ml-pipeline— automate train + registerversion-ml-data— version training data for lineage
Repositorio GitHub
Habilidades relacionadas
llamaguard
OtroLlamaGuard es el modelo de Meta de 7-8B parámetros para moderar las entradas y salidas de LLM en seis categorías de seguridad como violencia y discurso de odio. Ofrece una precisión del 94-95% y puede implementarse usando vLLM, Hugging Face o Amazon SageMaker. Utiliza esta skill para integrar fácilmente filtrado de contenido y barreras de seguridad en tus aplicaciones de IA.
cost-optimization
OtroEsta Skill de Claude ayuda a los desarrolladores a optimizar los costes en la nube mediante el ajuste de tamaño de recursos, estrategias de etiquetado y análisis de gastos. Proporciona un marco para reducir los gastos en la nube e implementar una gobernanza de costes en AWS, Azure y GCP. Úsala cuando necesites analizar los costes de infraestructura, ajustar el tamaño de los recursos o cumplir con restricciones presupuestarias.
quantizing-models-bitsandbytes
OtroEsta habilidad cuantiza LLMs a precisión de 8 o 4 bits utilizando bitsandbytes, logrando una reducción de memoria del 50-75% con pérdida mínima de precisión. Es ideal para ejecutar modelos más grandes en memoria GPU limitada o para acelerar la inferencia, admitiendo formatos como INT8, NF4 y FP4. La habilidad se integra con HuggingFace Transformers y permite entrenamiento QLoRA y optimizadores de 8 bits.
dispatching-parallel-agents
OtroEsta Skill de Claude despliega múltiples agentes para investigar y solucionar 3 o más problemas independientes de forma concurrente. Está diseñada para escenarios que involucran fallos no relacionados que pueden resolverse sin estado compartido o dependencias. Su capacidad principal es la resolución paralela de problemas, asignando un agente por cada dominio problemático independiente para maximizar la eficiencia.
