pre-mortem
关于
The pre-mortem skill helps developers identify concrete risks before implementation by imagining a project has already failed catastrophically and working backwards to find likely causes. It's used at planning stages—like architecture review or threat modeling—for irreversible or high-impact features. This method forces specific risk identification beyond generic lists, using a structured 5-step process.
快速安装
Claude Code
推荐npx skills add avelikiy/great_cto -a claude-code/plugin add https://github.com/avelikiy/great_ctogit clone https://github.com/avelikiy/great_cto.git ~/.claude/skills/pre-mortem在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Pre-mortem — fail-it-before-you-build-it
A retrospective for a project that hasn't happened yet. Surfaces real risks that "list every risk" prompts miss.
Originated in Gary Klein's research at MIT Sloan, now standard at AWS and other ops-mature orgs.
The 5-step pre-mortem
Step 1. Imagine you're 6 months in the future
The project shipped. It is a clear, public failure. There's a Reddit thread about it. The CEO is asking what went wrong.
Step 2. Write the post-mortem newspaper headline
One sentence. Concrete. Specific. Examples:
- ❌ Bad: "We had some quality issues."
- ✅ Good: "On 2026-09-12, the Stripe webhook handler deduplicated by raw body hash, so 30K customers were double-charged after Stripe retried delivery during a network blip."
The headline forces you to name the failure mode SPECIFICALLY.
Step 3. List every individual reason this exact failure happened
Brainstorm 10-15 reasons. Be specific. Each item should reference:
- A real component / file
- A real failure mode (race condition, schema mismatch, expired credential)
- A real human factor (oncall didn't see alert, runbook was outdated)
Reject hand-waves like "testing was insufficient." Replace with "we didn't write a property-based test for the dedup-key collision case."
Step 4. Rank by likelihood × severity
For each cause, score:
- Likelihood: 1-5 (1=once-in-a-decade, 5=monthly)
- Severity: 1-5 (1=cosmetic, 5=data loss / regulatory breach)
- Risk score: likelihood × severity
Top 3 by risk score → these are your highest-priority mitigations.
Step 5. For each top-3 cause, write a guardrail in the plan
Each guardrail is a concrete change to the plan:
- A test that would have caught it
- A circuit breaker / feature flag
- A runbook entry
- A monitoring alert with specific SLO
If a top-3 cause CANNOT be mitigated within the time/budget, escalate to the user: "This plan accepts the risk of X with no mitigation."
Template — add to PLAN-*.md
## Pre-mortem
Six months from now, this project failed. Headline:
> <one-sentence failure headline>
### Top reasons (likelihood × severity)
| Cause | L | S | Risk | Mitigation in plan |
|---|---|---|---|---|
| <specific cause> | 4 | 5 | 20 | <Task #N: write idempotency test> |
| ... | | | | |
### Accepted risks (no mitigation)
- <risk> — accepted because <budget/scope reason>. Owner: <name>.
Common failure modes by archetype
Quick start — most-common pre-mortem causes per archetype:
| Archetype | Common failure |
|---|---|
| fintech / commerce | Idempotency-key collision; double-charge during retry storm |
| healthcare | PHI leak via debug log; BAA not signed with vendor |
| web3 | Oracle staleness; flash-loan exploit on bonding curve |
| mlops | Training/serving skew; model drift undetected |
| iot-embedded | OTA bricks devices in a region with no recovery path |
| data-platform | Late-arriving data overwrites correct values |
| ai-system / agent-product | Prompt injection exfiltrates other users' data |
| enterprise-saas | Cross-tenant data leak via RLS gap |
| cli-tool | Destructive flag with no confirmation (rm -rf equivalent) |
| library | Breaking change in minor version bump |
Anti-patterns in pre-mortems
❌ Vague risks. "Performance might be a problem." Be specific: which operation, at what load, what's the SLO.
❌ Cosmic risks. "AWS could go down." Yes, but that's not actionable. Focus on what you can mitigate.
❌ Defensive list. Listing risks you've already mitigated to look thorough. Only list risks the current plan does NOT yet address.
❌ Skip the headline. Without the headline, the team won't believe the failure scenario is real.
When to skip
- nano project_size — pre-mortem is overhead.
- Pure refactor with full test coverage — guardrails already exist.
- Bug-fix with one-line repro — risk is well-bounded.
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
