slurm-job-script-generator
关于
This skill generates validated SLURM sbatch job scripts with automatic sanity checking for HPC resource requests. It helps developers configure MPI/OpenMP layouts, GPU jobs, and debug launch issues while preventing directive conflicts. Use it when preparing cluster submissions, standardizing team scripts, or troubleshooting failed jobs.
快速安装
Claude Code
推荐npx skills add HeshamFS/materials-simulation-skills -a claude-code/plugin add https://github.com/HeshamFS/materials-simulation-skillsgit clone https://github.com/HeshamFS/materials-simulation-skills.git ~/.claude/skills/slurm-job-script-generator在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
SLURM Job Script Generator
Goal
Generate a correct, copy-pasteable SLURM job script (.sbatch) for running a simulation, and surface common configuration mistakes (bad walltime format, conflicting memory flags, oversubscription hints).
Requirements
- Python 3.8+
- No external dependencies (Python standard library only)
- Works on Linux, macOS, and Windows (script generation only)
Inputs to Gather
| Input | Description | Example |
|---|---|---|
| Job name | Short identifier for the job | phasefield-strong-scaling |
| Walltime | SLURM time limit | 00:30:00 |
| Partition | Cluster partition/queue (if required) | compute |
| Account | Project/account (if required) | matsim |
| Nodes | Number of nodes to allocate | 2 |
| MPI tasks | Total tasks, or tasks per node | 128 or 64 per node |
| Threads | CPUs per task (OpenMP threads) | 2 |
| Memory | --mem or --mem-per-cpu (cluster policy dependent) | 32G |
| GPUs | GPUs per node (optional) | 4 |
| Working directory | Where the run should execute | $SLURM_SUBMIT_DIR |
| Modules | Environment modules to load (optional) | gcc/12, openmpi/4.1 |
| Run command | The command to launch under SLURM | ./simulate --config cfg.json |
Decision Guidance
MPI vs MPI+OpenMP layout
Does the code use OpenMP / threading?
├── NO → Use MPI-only: cpus-per-task=1
└── YES → Use hybrid: set cpus-per-task = threads per MPI rank
and export OMP_NUM_THREADS = cpus-per-task
Rule of thumb: if you see diminishing strong-scaling efficiency at high MPI ranks, try fewer ranks with more threads per rank (and measure).
Memory flag selection
- Use either
--mem(per node) or--mem-per-cpu(per CPU), not both. - Follow your cluster’s documentation; some sites enforce one style.
- SLURM
--memunits are integer MB by default, or an integer with suffixK/M/G/T(and--mem=0commonly means “all memory on node”).
Script Outputs (JSON Fields)
| Script | Key Outputs |
|---|---|
scripts/slurm_script_generator.py | results.script, results.directives, results.derived, results.warnings |
Workflow
- Gather cluster constraints (partition/account, GPU policy, memory policy).
- Choose a process layout (MPI-only vs hybrid MPI+OpenMP).
- Generate the script with
slurm_script_generator.py. - Inspect warnings (conflicts, suspicious layouts).
- Save the generated script as
job.sbatch. - Submit with
sbatch job.sbatchand monitor withsqueue.
CLI Examples
# Preview a job script (prints to stdout)
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
--job-name phasefield \
--time 00:10:00 \
--partition compute \
--nodes 1 \
--ntasks-per-node 8 \
--cpus-per-task 2 \
--mem 16G \
--module gcc/12 \
--module openmpi/4.1 \
-- \
./simulate --config config.json
# Write to a file and also emit structured JSON
python3 skills/hpc-deployment/slurm-job-script-generator/scripts/slurm_script_generator.py \
--job-name phasefield \
--time 00:10:00 \
--nodes 1 \
--ntasks 16 \
--cpus-per-task 1 \
--out job.sbatch \
--json \
-- \
/bin/echo hello
Conversational Workflow Example
User: I need an sbatch script for my MPI simulation. I want 2 nodes, 64 ranks per node, 2 OpenMP threads per rank, and 2 hours.
Agent workflow:
- Confirm partition/account and whether GPUs are needed.
- Generate a hybrid job script:
python3 scripts/slurm_script_generator.py --job-name run --time 02:00:00 --nodes 2 --ntasks-per-node 64 --cpus-per-task 2 -- -- ./simulate - Explain the mapping:
- Total ranks = 128
- Threads per rank = 2 (
OMP_NUM_THREADS=2)
- If the user provides node core counts, sanity-check oversubscription using
--cores-per-node.
Error Handling
| Error | Cause | Resolution |
|---|---|---|
time must be HH:MM:SS or D-HH:MM:SS | Bad walltime format | Use 00:30:00 or 1-00:00:00 |
nodes must be positive | Non-positive nodes | Provide --nodes >= 1 |
Provide either --mem or --mem-per-cpu, not both | Conflicting memory directives | Choose one memory style |
Provide a run command after -- | Missing launch command | Add -- ./simulate ... |
Security
Input Validation
--timeis validated against strictHH:MM:SSorD-HH:MM:SSformat via regex--nodes,--ntasks,--ntasks-per-node,--cpus-per-task,--gpusare validated as positive integers with upper bounds--memand--mem-per-cpuare validated against SLURM's accepted format (<int>[K|M|G|T]); providing both simultaneously is rejected--job-nameis validated against[a-zA-Z0-9_.-]+(no shell metacharacters)--partitionand--accountare validated against safe-character allowlists--modulevalues are validated to prevent shell injection (no;,|,&, backticks, or$)
File Access
- The script reads no external files; all inputs are provided via CLI arguments
--outwrites the generated sbatch script to a single specified file path- The generated script is a plain-text shell script with
#SBATCHdirectives; it contains no dynamically generated code
Tool Restrictions
- Read: Used to inspect script source, references, and existing job scripts
- Bash: Used to execute
slurm_script_generator.pywith explicit argument lists; the generated script itself is NOT executed by the agent - Write: Used to save the generated
.sbatchfile; writes are scoped to the user's working directory - Grep/Glob: Used to locate existing scripts, configs, and cluster documentation
Safety Measures
- No
eval(),exec(), or dynamic code generation - All subprocess calls use explicit argument lists (no
shell=True) - The run command (after
--) is included verbatim in the generated script but is never executed by the skill itself - Module names are sanitized to prevent injection into
module loaddirectives - Generated scripts use
set -euo pipefailfor safe shell execution on the cluster
Limitations
- Does not query cluster hardware or site policies; it can only validate internal consistency.
- SLURM installations vary (GPU directives, QoS rules, partitions). Adjust directives for your site.
References
references/slurm_directives.md- Common#SBATCHdirectives and mapping tips
Version History
- v1.0.0 (2026-02-25): Initial SLURM job script generator
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
