返回技能列表

postmortem-solo

rockscy
更新于 2 days ago
5 次查看
2
1
2
在 GitHub 上查看
其他ai

关于

This skill guides a developer through a structured, individual postmortem after a solo failure like a missed deadline or failed launch. It enforces a blame-free analysis with strict word limits across five sections to identify root causes. The goal is to produce one or two concrete, actionable changes rather than extensive documentation.

快速安装

Claude Code

推荐
主要方式
npx skills add rockscy/solo-skills -a claude-code
插件命令备选方式
/plugin add https://github.com/rockscy/solo-skills
Git 克隆备选方式
git clone https://github.com/rockscy/solo-skills.git ~/.claude/skills/postmortem-solo

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Postmortem Solo / 单兵复盘

When to use

  • Something didn't go to plan in the last 30 days: outage, failed launch, missed deadline, customer churn spike, broken release.
  • The user is the only person responsible — there's no team to interview.
  • The goal is one or two concrete changes, not a 10-page report.

When NOT to use

  • The incident is still active — fix it first, postmortem after.
  • The user wants emotional processing, not a postmortem. Suggest a walk or a friend; this skill is for cold analysis.
  • It's a routine setback (one bug in production, one slow week) — not every bump needs a postmortem.

Structure

Five sections, strict word limits:

  1. What happened — 3 bullets, factual, no analysis. (≤ 60 words)
  2. Impact — what did the failure cost? Users affected, revenue lost, hours sunk, reputation hit. (≤ 40 words)
  3. Timeline — bullet list with timestamps if known: trigger → detection → response → resolution. (≤ 100 words)
  4. Root cause(s) — apply 5 Whys but stop the moment you hit a systemic cause (process, default, missing tool), not a personal one ("I was tired"). (≤ 80 words)
  5. Changes — at most 2 concrete changes with owner (you), due date, and verification method. (≤ 60 words)

Total: ≤ 350 words. If you need more, you're hiding from a decision.

The "no blame, but yes accountability" rule

Solo postmortems have a trap: you're the only person who could have caused this, so it's tempting to either (a) flagellate yourself, or (b) blame externalities ("Stripe was flaky"). Neither is useful.

Correct framing: "What system / habit / default could I change so that next-month-me cannot make this mistake even if next-month-me is tired and distracted?"

This forces solutions toward process, automation, or alarms — not "be more careful next time."

Output format

# Postmortem: <one-sentence headline>

**Date:** <YYYY-MM-DD> — drafted <X> days after the incident.

## What happened
- …
- …
- …

## Impact
…

## Timeline
- HH:MM — <event>
- HH:MM — <event>

## Root cause(s)
…

## Changes
1. <change> — owner: me, due: <date>, verified by: <how I'll know it worked>
2. <change> — owner: me, due: <date>, verified by: <how>

Worked example

Input: "I deployed on Friday afternoon, broke the signup flow for ~6 hours, lost about 40 trial signups before I noticed."

Output:

# Postmortem: Friday-afternoon deploy broke signup for 6 hours

**Date:** 2026-04-30 — drafted 2 days after the incident.

## What happened
- Deployed schema migration at 16:45 local time.
- Migration removed a column the signup form still referenced.
- Signup returned 500 for ~6 hours until I checked Stripe at 22:30.

## Impact
~40 trial signups lost (Stripe shows 0 new customers in the window vs. ~7/hour baseline). Estimated revenue at risk: ~$280/mo if 10% would have converted.

## Timeline
- 16:45 — deploy pushed.
- 16:50 — first 500 (no alert; error logger was rate-limited).
- 22:30 — noticed via empty Stripe dashboard, rolled back migration in 8 min.

## Root cause(s)
The signup form references the dropped column directly, not via an abstraction. CI ran migrations against an empty test DB, so the form-vs-schema mismatch never surfaced. No deploy-time alert fires when 500 rate exceeds baseline.

## Changes
1. Add a basic uptime check that posts a Slack ping on >5x baseline 500s — owner: me, due: 2026-05-03, verified by: triggering a test 500 and confirming the ping.
2. Block deploys after 16:00 local time on Fridays (calendar reminder + git pre-push hook) — owner: me, due: 2026-05-02, verified by: trying to deploy at 17:00 Friday and getting blocked.

Anti-patterns

  • "I'll be more careful next time" is not a change. It's a wish. Replace it with a system change.
  • More than 2 changes per postmortem. You won't do them. Cut to two.
  • Listing root causes that don't tie to a change. If a cause doesn't generate an action, omit it.

中文版

何时使用

  • 最近 30 天内没按计划走:故障、发布失败、错过 deadline、客户大量流失、版本翻车。
  • 用户是唯一负责人——没团队可访谈。
  • 目标是一两个具体改动,不是 10 页大报告。

何时不使用

  • 故障还在进行中——先修,再复盘。
  • 用户想情绪处理而非复盘——建议散步或找朋友聊,这个技能是冷分析。
  • 只是常规小坎坷(一个线上 bug,一周不顺)——不是每次都需要复盘。

结构

五段,严格字数限制

  1. 发生了什么——3 条事实,不带分析。(≤ 60 字)
  2. 影响——损失了什么?(≤ 40 字)
  3. 时间线——触发→发现→响应→修复,带时间戳。(≤ 100 字)
  4. 根本原因——5 Why,但停在系统性原因(流程/默认/缺工具),不是"我累了"。(≤ 80 字)
  5. 改动——最多两条,要有 owner(你)、截止日、验证方式。(≤ 60 字)

总长 ≤ 350 字。再多就是在逃避决定。

"不归罪但要负责"原则

单兵复盘有个陷阱:唯一可能犯错的就是你,所以容易要么(a)自责,要么(b)甩锅外部。两者都没用。

正确框架: "我能改变什么系统/习惯/默认值,让下个月的我即使累了走神也不会再犯?"

这逼着方案朝流程、自动化、告警走——而不是"下次更小心"。

反模式

  • "下次更小心"不是改动,是愿望。换成系统性改动。
  • 一次复盘超过 2 条改动——你做不完。砍到两条。
  • 列出不对应改动的根因——如果一个根因不会触发行动,删掉。

GitHub 仓库

rockscy/solo-skills
路径: skills/postmortem-solo
0
ai-agentsawesome-listbilingualclaude-codeclaude-skillsdeveloper-tools

相关推荐技能

llamaguard

其他

LlamaGuard是Meta推出的7-8B参数内容审核模型,专门用于过滤LLM的输入和输出内容。它能检测六大安全风险类别(暴力/仇恨、性内容、武器、违禁品、自残、犯罪计划),准确率达94-95%。开发者可通过HuggingFace、vLLM或Sagemaker快速部署,并能与NeMo Guardrails集成实现自动化安全防护。

查看技能

cost-optimization

其他

这个Claude Skill帮助开发者优化云成本,通过资源调整、标记策略和预留实例来降低AWS、Azure和GCP的开支。它适用于减少云支出、分析基础设施成本或实施成本治理策略的场景。关键功能包括提供成本可视化、资源规模调整指导和定价模型优化建议。

查看技能

quantizing-models-bitsandbytes

其他

这个Skill使用bitsandbytes库量化大语言模型,能在GPU内存有限时通过8位或4位量化减少50-75%内存占用,同时保持精度损失最小。它支持INT8、NF4、FP4等多种量化格式,可与HuggingFace Transformers无缝集成,适用于需要部署更大模型或加速推理的场景。还提供QLoRA训练和8位优化器支持,让开发者能轻松实现高效模型压缩。

查看技能

dispatching-parallel-agents

其他

该Skill用于并行处理3个以上无依赖关系的独立故障,可为每个问题域分派专属Claude代理同时执行调查修复。它通过并发处理多个独立问题显著提升故障排查效率,特别适用于测试文件、子系统等无共享状态的场景。

查看技能