SKILL·C054C4

simulation-failure-triage

Name: simulation-failure-triage
Author: HeshamFS

HeshamFS

Updated 1 month ago

9 views

Developmentai

About

This skill helps developers triage failed materials simulations by diagnosing common issues like nonconvergence, NaN/Inf errors, and unstable timesteps. It proposes safe, defensible retry ladders and immediate actions for recovery. Use it when you encounter a suspicious or failed simulation and need a structured first response.

Quick Install

Claude Code

Recommended

Primary

npx skills add HeshamFS/materials-simulation-skills -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/HeshamFS/materials-simulation-skills

Git CloneAlternative

git clone https://github.com/HeshamFS/materials-simulation-skills.git ~/.claude/skills/simulation-failure-triage

Copy and paste this command in Claude Code to install this skill

Documentation

Simulation Failure Triage

Goal

Classify common simulation failure signatures and return immediate actions, retry ladders, and stop conditions.

Requirements

Python 3.10+
No external dependencies
Works on Linux, macOS, and Windows

Inputs to Gather

Input	Description	Example
Code	Simulation code	`LAMMPS`, `VASP`, `MOOSE`, `QE`
Stage	Setup, runtime, postprocess	`runtime`
Symptoms	Failure signs	`nan,pressure-blowup`
Log text or file	Error evidence	`Lost atoms`, `ZBRENT`
Recent change	Last modified setting	`larger timestep`

Decision Guidance

First preserve evidence: logs, inputs, executable version, and scheduler output.
Separate setup errors from numerical instability and physical model issues.
Retry with a single controlled change.
Stop retrying when the result becomes scientifically meaningless or a required model input is missing.

Script Outputs

scripts/failure_triage.py emits:

likely_causes
immediate_actions
retry_ladder
stop_conditions
evidence

Workflow

python3 skills/robustness/simulation-failure-triage/scripts/failure_triage.py \
  --code LAMMPS \
  --stage runtime \
  --symptoms nan,pressure-blowup \
  --recent-change "increased timestep" \
  --json

Error Handling

Invalid stages or oversized log files stop with exit code 2. Unknown symptoms are retained as custom evidence.

Limitations

This skill gives first-response triage. It does not guarantee that a failed simulation can be repaired.

Security

Log files are read with a 10 MB size cap.
Log text is truncated and never executed.
The script does not run external solvers.
The skill uses Bash only to run its bundled script.

References

See references/failure_patterns.md for common failure signatures and retry ladders.

Version History

1.0.0: Initial cross-code simulation failure triage skill.

GitHub Repository

HeshamFS/materials-simulation-skills

Path: skills/robustness/simulation-failure-triage

agent-skillsagentscli-toolscomputational-sciencellmmaterials-science

FAQ

Frequently asked questions

What is the simulation-failure-triage skill?

simulation-failure-triage is a Claude Skill by HeshamFS. Skills package instructions and resources that Claude loads on demand, so Claude can perform simulation-failure-triage-related tasks without extra prompting.

How do I install simulation-failure-triage?

Use the install commands on this page: add simulation-failure-triage to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does simulation-failure-triage belong to?

simulation-failure-triage is in the Development category, tagged ai.

Is simulation-failure-triage free to use?

Yes. simulation-failure-triage is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

qmd

Development

qmd is a local search and indexing CLI tool that enables developers to index and search through local files using hybrid search combining BM25, vector embeddings, and reranking. It supports both command-line usage and MCP (Model Context Protocol) mode for integration with Claude. The tool uses Ollama for embeddings and stores indexes locally, making it ideal for searching documentation or codebases directly from the terminal.

View skill

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

mcporter

Development

The mcporter skill enables developers to manage and call Model Context Protocol (MCP) servers directly from Claude. It provides commands to list available servers, call their tools with arguments, and handle authentication and daemon lifecycle. Use this skill for integrating and testing MCP server functionality in your development workflow.

View skill

adk-deployment-specialist

Development

This skill deploys and orchestrates Vertex AI ADK agents using A2A protocol, managing AgentCard discovery, task submission, and supporting tools like Code Execution Sandbox and Memory Bank. It enables building multi-agent systems with sequential, parallel, or loop orchestration patterns in Python, Java, or Go. Use it when asked to deploy ADK agents or orchestrate agent workflows on Google Cloud.

View skill