MCP HubMCP Hub
返回技能列表

ml-expert

DNYoussef
更新于 2 days ago
28 次查看
9
2
9
在 GitHub 上查看
设计aidesign

关于

The ml-expert skill designs, implements, and optimizes production-grade machine learning models and training pipelines. Use it for architecture design, translating research to code, and optimizing inference, but not for pure data analysis. It enforces structured project organization and includes explicit guardrails for resilient ML system development.

快速安装

Claude Code

推荐
插件命令推荐
/plugin add https://github.com/DNYoussef/context-cascade
Git 克隆备选方式
git clone https://github.com/DNYoussef/context-cascade.git ~/.claude/skills/ml-expert

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

STANDARD OPERATING PROCEDURE

Purpose

Ship resilient ML systems: architecture design, training pipelines, optimization, and deployment readiness with explicit guardrails.

Triggers

  • Positive: Implementing architectures, training/tuning models, fixing training instabilities, optimizing inference, translating research into code.
  • Negative: Pure data analysis (route to data scientist) or root-cause training incidents (prefer ml-training-debugger first).

Guardrails

  • Structure-first: maintain SKILL.md, examples/, tests/, resources/, and agents/; backfill missing docs before execution.
  • Constraint hygiene (prompt-architect): collect HARD/SOFT/INFERRED requirements (targets, latency, memory, compliance).
  • Validation discipline (skill-forge): adversarial tests for data leakage, class imbalance, and distribution shift; always run baseline + ablations.
  • Evidence + confidence ceiling: report metrics with data splits and Confidence: X.XX (ceiling: TYPE Y.YY) (inference/report 0.70; research 0.85; observation/definition 0.95).
  • Safety: never evaluate on train data; never touch test set until final validation; document assumptions and monitoring plan.

Execution Phases

  1. Intake & Goals
    • Identify objective, metrics (accuracy/F1/RMSE/latency), constraints (hardware, model size, privacy).
    • Confirm data availability, provenance, and allowed tooling.
  2. Design
    • Choose architecture and loss/optimization strategy; plan data splits and augmentation; define monitoring signals.
    • Draft experiment plan with baseline + targeted variants.
  3. Implementation
    • Build reproducible pipelines (seed control, config versioning); implement training loop with logging (TensorBoard/MLflow/W&B).
    • Enforce safe defaults: mixed precision gated by tests, gradient clipping where appropriate, checkpointing with retention policy.
  4. Validation
    • Run baseline then ablations; check class-wise metrics, calibration, and drift sensitivity.
    • Profile training/inference latency; quantify memory footprint.
    • Security checks: adversarial probes, prompt/feature injection handling for LLM/vision models.
  5. Deployment Readiness
    • Package artifacts (model weights, config, preprocessing, schema); provide rollout + rollback steps.
    • Attach monitoring plan (drift, performance, cost) and ownership.

Output Format

  • Request summary and constraints (HARD/SOFT/INFERRED).
  • Architecture + data plan, experiment matrix, and validation results.
  • Deployment checklist with monitoring hooks and rollback path.
  • Confidence statement with ceiling and evidence source.

Validation Checklist

  • Data splits clean (no leakage) and documented.
  • Baseline + ablations executed; metrics reported with variance.
  • Latency/memory within targets; profiling attached.
  • Safety checks run (bias, drift, adversarial probes) or noted N/A.
  • Reproducibility ensured (seeds/configs/versioning).
  • Confidence ceiling stated.

VCL COMPLIANCE APPENDIX (Internal)

[[HON:teineigo]] [[MOR:root:M-L]] [[COM:Model+Schmiede]] [[CLS:ge_skill]] [[EVD:-DI<gozlem>]] [[ASP:nesov.]] [[SPC:path:/skills/specialists/ml-expert]]

[[HON:teineigo]] [[MOR:root:E-P-S]] [[COM:Epistemik+Tavan]] [[CLS:ge_rule]] [[EVD:-DI<gozlem>]] [[ASP:nesov.]] [[SPC:coord:EVD-CONF]]

[[HON:teineigo]] [[MOR:root:S-F-T]] [[COM:Safety+Test]] [[CLS:ge_guardrail]] [[EVD:-DI<gozlem>]] [[ASP:nesov.]] [[SPC:axis:quality]]

Confidence: 0.74 (ceiling: inference 0.70) - SOP rebuilt with prompt-architect constraints and skill-forge validation loops while preserving ML execution depth.

GitHub 仓库

DNYoussef/context-cascade
路径: skills/specialists/ml-expert

相关推荐技能

content-collections

Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。

查看技能

creating-opencode-plugins

该Skill为开发者创建OpenCode插件提供指导,涵盖命令、文件、LSP等25+种事件类型。它详细说明了插件结构、事件API规范及JavaScript/TypeScript实现模式,帮助开发者构建事件驱动的模块。适用于需要拦截操作、扩展功能或自定义AI助手行为的插件开发场景。

查看技能

sglang

SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。

查看技能

evaluating-llms-harness

测试

该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。

查看技能