MCP HubMCP Hub
스킬 목록으로 돌아가기

register-ml-model

pjt222
업데이트됨 Yesterday
1 조회
17
2
17
GitHub에서 보기
기타aiautomationdata

정보

이 스킬은 학습된 모델을 MLflow 모델 레지스트리에 등록하여 버전 관리와 승인 워크플로를 통한 관리형 단계 전환(예: 스테이징에서 프로덕션으로)을 제공합니다. 실험 환경에서 프로덕션으로 모델을 승격시키고, 여러 단계에 걸친 다양한 버전을 관리하며, 롤백이나 감사 준수를 처리하는 데 사용됩니다. 개발자는 MLOps 파이프라인 내에서 체계적인 배포 거버넌스와 모델 계보 추적을 위해 이 스킬을 활용해야 합니다.

빠른 설치

Claude Code

추천
기본
npx skills add pjt222/agent-almanac -a claude-code
플러그인 명령대체
/plugin add https://github.com/pjt222/agent-almanac
Git 클론대체
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/register-ml-model

Claude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요

문서

Register ML Model

See Extended Examples for complete configuration files and templates.

Impl MLflow Model Registry → systematic model versioning, stage mgmt, deployment governance.

Use When

  • Promote trained model exp → prod
  • Manage multi vers across dev stages
  • Impl approval workflows → governance
  • Track lineage train → deploy
  • Rollback to prev vers
  • Compare deployed vers → A/B test
  • Audit changes → compliance

In

  • Required: MLflow tracking server w/ Model Registry enabled
  • Required: Trained model logged w/ MLflow (from tracking runs)
  • Required: Model name → registry registration
  • Optional: Approval workflow (email, Slack, Jira)
  • Optional: CI/CD pipeline → auto promotion
  • Optional: Validation metric thresholds

Do

Step 1: Configure Backend

Set up MLflow Model Registry w/ DB backend (file-based not rec for prod).

# Start MLflow server with Model Registry support
mlflow server \
  --backend-store-uri postgresql://user:pass@localhost:5432/mlflow \
  --default-artifact-root s3://mlflow-artifacts/models \
  --host 0.0.0.0 \
  --port 5000

Python config:

# model_registry_config.py
import mlflow
from mlflow.tracking import MlflowClient

# Set tracking URI (must support Model Registry)
MLFLOW_TRACKING_URI = "http://mlflow-server.company.com:5000"
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

# ... (see EXAMPLES.md for complete implementation)

→ Model Registry UI tab in MLflow, search_registered_models() returns success (even empty), DB has registered_models table.

If err: verify MLflow ≥ 1.2 (Model Registry from 1.2), check DB backend (SQLite not fully supported), --backend-store-uri → DB not file://, DB user has CREATE TABLE perms, server logs for migration errs.

Step 2: Register from Run

Register logged model → Model Registry w/ comprehensive metadata.

# register_model.py
import mlflow
from mlflow.tracking import MlflowClient
from model_registry_config import MLFLOW_TRACKING_URI

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
client = MlflowClient()

# ... (see EXAMPLES.md for complete implementation)

→ New ver in Registry UI, ver has desc + tags, artifacts accessible via models:/<model-name>/<version>, signature + input ex preserved.

If err: verify run_id exists + completed (client.get_run(run_id)), check artifact path matches logged (mlflow.search_runs()), model logged w/ proper framework flavor (mlflow.sklearn.log_model not mlflow.log_artifact), no special chars in name (hyphens not underscores), check artifact storage access.

Step 3: Stage Transitions w/ Validation

Move vers through stages (None → Staging → Production → Archived) w/ validation.

# stage_management.py
import mlflow
from mlflow.tracking import MlflowClient
from datetime import datetime

client = MlflowClient()

class ModelStageManager:
# ... (see EXAMPLES.md for complete implementation)

→ Ver stage updates in registry, old vers archived auto, transition timestamps in tags, rollback restores prev prod ver.

If err: check ver exists + in expected stage, verify archive_existing_versions flag (may not archive if only one ver), DB supports concurrent transactions for stage updates, check stage transition locks (one per ver at a time), verify approval workflow.

Step 4: Aliasing + Refs

Use model aliases for stable deployment refs (MLflow ≥ 2.0).

# model_aliases.py
from mlflow.tracking import MlflowClient

client = MlflowClient()

def set_model_alias(model_name, version, alias):
    """
    Set an alias for a model version (MLflow 2.0+).
# ... (see EXAMPLES.md for complete implementation)

→ Aliases in Registry UI, loading by alias works (models:/name@alias), updating alias immediately affects new loads, A/B test infra functional.

If err: upgrade MLflow ≥ 2.0 for native alias support, use tag-based fallback older vers, verify alias naming (alphanumeric + hyphens), check alias conflicts (one per ver).

Step 5: Lineage Tracking

Track full lineage data → deploy w/ comprehensive metadata.

# model_lineage.py
import mlflow
from mlflow.tracking import MlflowClient
import json

client = MlflowClient()

def enrich_model_metadata(model_name, version, lineage_data):
# ... (see EXAMPLES.md for complete implementation)

→ Ver tags w/ comprehensive lineage, get_model_lineage() returns full history, JSON report has data source, training, deploy info.

If err: verify tag values are strings (convert dicts → JSON), check tag key naming (no spaces/special), lineage captured during train, run_id valid + accessible.

Step 6: Automate w/ CI/CD

Integrate registration → CI/CD → auto promotion.

# .github/workflows/model_promotion.yml
name: Model Promotion Pipeline

on:
  workflow_dispatch:
    inputs:
      model_name:
        description: 'Model name to promote'
# ... (see EXAMPLES.md for complete implementation)

Python automation:

# scripts/promote_model.py
import argparse
from stage_management import ModelStageManager

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--model-name", required=True)
    parser.add_argument("--version", type=int, required=True)
# ... (see EXAMPLES.md for complete implementation)

→ Actions workflow triggers on manual dispatch, validation passes, model promoted to target stage, Slack notif sent, deploy pipeline triggered auto.

If err: check GH secrets for MLFLOW_TRACKING_URI, verify net access GH Actions → MLflow (may need VPN/IP allowlist), validation script has correct thresholds, Slack webhook config, Python script exec perms.

Check

  • Model Registry accessible + backend configured
  • Models register from training runs
  • Stage transitions work (None → Staging → Production → Archived)
  • Validation enforces quality thresholds
  • Aliases set + resolved
  • Lineage captured comprehensively
  • Rollback restores prev vers
  • CI/CD automates promotions
  • Team notifs work for stage changes
  • Model URIs resolve all stages

Traps

  • SQLite limits: Registry needs DB backend (Postgres/MySQL) for prod → file-based = concurrency issues
  • Stage conflicts: Multi vers same stage = confusion → use archive_existing_versions=True auto-archive
  • Missing run linkage: Register w/o run_id loses lineage → always from runs, not raw files
  • Alias confusion: Using stages as deploy targets vs aliases → stages = workflow, aliases = deploy refs
  • Validation skipped: Promote to Prod w/o checks → mandatory validation in CI/CD
  • No rollback plan: Prod issues w/o rollback → maintain prev Prod ver in Archived stage
  • Tag overload: Too many unstructured → standardize schema + naming
  • Manual processes: Human-driven = error-prone + slow → automate w/ CI/CD + approvals
  • Lost artifacts: Model registered but artifacts deleted → align retention w/ lifecycle

  • track-ml-experiments — log models to MLflow before register
  • deploy-ml-model-serving — deploy registered models → serving infra
  • run-ab-test-models — A/B test using registry aliases
  • orchestrate-ml-pipeline — automate train + register
  • version-ml-data — version training data for lineage

GitHub 저장소

pjt222/agent-almanac
경로: i18n/caveman-ultra/skills/register-ml-model
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

연관 스킬

llamaguard

기타

LlamaGuard는 폭력 및 혐오 발언 등 6가지 안전 범주에서 LLM 입력과 출력을 조정하기 위한 Meta의 70-80억 파라미터 모델입니다. 94-95% 정확도를 제공하며 vLLM, Hugging Face 또는 Amazon SageMaker를 사용해 배포할 수 있습니다. 이 기술을 사용하여 AI 애플리케이션에 콘텐츠 필터링 및 안전 가드레일을 손쉽게 통합하세요.

스킬 보기

cost-optimization

기타

이 Claude Skill은 리소스 적정화, 태깅 전략, 지출 분석을 통해 개발자들이 클라우드 비용을 최적화할 수 있도록 지원합니다. AWS, Azure, GCP에서 클라우드 비용을 절감하고 비용 거버넌스를 구현하기 위한 프레임워크를 제공합니다. 인프라 비용을 분석하거나, 리소스를 적정화하거나, 예산 제약을 충족해야 할 때 사용하세요.

스킬 보기

quantizing-models-bitsandbytes

기타

이 스킬은 bitsandbytes를 사용하여 LLM을 8비트 또는 4비트 정밀도로 양자화하며, 최소한의 정확도 손실로 50-75%의 메모리 감소를 달성합니다. 제한된 GPU 메모리에서 더 큰 모델을 실행하거나 추론을 가속화하는 데 이상적이며, INT8, NF4, FP4와 같은 형식을 지원합니다. 이 스킬은 HuggingFace Transformers와 통합되어 QLoRA 학습 및 8비트 옵티마이저를 가능하게 합니다.

스킬 보기

dispatching-parallel-agents

기타

이 Claude Skill은 3개 이상의 독립적인 문제를 동시에 조사하고 해결하기 위해 다중 에이전트를 배치합니다. 공유 상태나 의존성 없이 해결 가능한 무관련 장애 시나리오에 맞게 설계되었습니다. 핵심 기능은 병렬 문제 해결로, 각 독립 문제 영역마다 하나의 에이전트를 할당하여 효율성을 극대화합니다.

스킬 보기