MCP HubMCP Hub
Volver a habilidades

postmortem-solo

rockscy
Actualizado 2 days ago
8 vistas
2
1
2
Ver en GitHub
Otroai

Acerca de

Esta habilidad guía a un desarrollador a través de un análisis postmortem estructurado e individual tras un fracaso en solitario, como un plazo incumplido o un lanzamiento fallido. Impone un análisis sin culpas con límites estrictos de palabras en cinco secciones para identificar las causas raíz. El objetivo es producir uno o dos cambios concretos y accionables, en lugar de una documentación extensa.

Instalación rápida

Claude Code

Recomendado
Principal
npx skills add rockscy/solo-skills -a claude-code
Comando PluginAlternativo
/plugin add https://github.com/rockscy/solo-skills
Git CloneAlternativo
git clone https://github.com/rockscy/solo-skills.git ~/.claude/skills/postmortem-solo

Copia y pega este comando en Claude Code para instalar esta habilidad

Documentación

Postmortem Solo / 单兵复盘

When to use

  • Something didn't go to plan in the last 30 days: outage, failed launch, missed deadline, customer churn spike, broken release.
  • The user is the only person responsible — there's no team to interview.
  • The goal is one or two concrete changes, not a 10-page report.

When NOT to use

  • The incident is still active — fix it first, postmortem after.
  • The user wants emotional processing, not a postmortem. Suggest a walk or a friend; this skill is for cold analysis.
  • It's a routine setback (one bug in production, one slow week) — not every bump needs a postmortem.

Structure

Five sections, strict word limits:

  1. What happened — 3 bullets, factual, no analysis. (≤ 60 words)
  2. Impact — what did the failure cost? Users affected, revenue lost, hours sunk, reputation hit. (≤ 40 words)
  3. Timeline — bullet list with timestamps if known: trigger → detection → response → resolution. (≤ 100 words)
  4. Root cause(s) — apply 5 Whys but stop the moment you hit a systemic cause (process, default, missing tool), not a personal one ("I was tired"). (≤ 80 words)
  5. Changes — at most 2 concrete changes with owner (you), due date, and verification method. (≤ 60 words)

Total: ≤ 350 words. If you need more, you're hiding from a decision.

The "no blame, but yes accountability" rule

Solo postmortems have a trap: you're the only person who could have caused this, so it's tempting to either (a) flagellate yourself, or (b) blame externalities ("Stripe was flaky"). Neither is useful.

Correct framing: "What system / habit / default could I change so that next-month-me cannot make this mistake even if next-month-me is tired and distracted?"

This forces solutions toward process, automation, or alarms — not "be more careful next time."

Output format

# Postmortem: <one-sentence headline>

**Date:** <YYYY-MM-DD> — drafted <X> days after the incident.

## What happened
- …
- …
- …

## Impact
…

## Timeline
- HH:MM — <event>
- HH:MM — <event>

## Root cause(s)
…

## Changes
1. <change> — owner: me, due: <date>, verified by: <how I'll know it worked>
2. <change> — owner: me, due: <date>, verified by: <how>

Worked example

Input: "I deployed on Friday afternoon, broke the signup flow for ~6 hours, lost about 40 trial signups before I noticed."

Output:

# Postmortem: Friday-afternoon deploy broke signup for 6 hours

**Date:** 2026-04-30 — drafted 2 days after the incident.

## What happened
- Deployed schema migration at 16:45 local time.
- Migration removed a column the signup form still referenced.
- Signup returned 500 for ~6 hours until I checked Stripe at 22:30.

## Impact
~40 trial signups lost (Stripe shows 0 new customers in the window vs. ~7/hour baseline). Estimated revenue at risk: ~$280/mo if 10% would have converted.

## Timeline
- 16:45 — deploy pushed.
- 16:50 — first 500 (no alert; error logger was rate-limited).
- 22:30 — noticed via empty Stripe dashboard, rolled back migration in 8 min.

## Root cause(s)
The signup form references the dropped column directly, not via an abstraction. CI ran migrations against an empty test DB, so the form-vs-schema mismatch never surfaced. No deploy-time alert fires when 500 rate exceeds baseline.

## Changes
1. Add a basic uptime check that posts a Slack ping on >5x baseline 500s — owner: me, due: 2026-05-03, verified by: triggering a test 500 and confirming the ping.
2. Block deploys after 16:00 local time on Fridays (calendar reminder + git pre-push hook) — owner: me, due: 2026-05-02, verified by: trying to deploy at 17:00 Friday and getting blocked.

Anti-patterns

  • "I'll be more careful next time" is not a change. It's a wish. Replace it with a system change.
  • More than 2 changes per postmortem. You won't do them. Cut to two.
  • Listing root causes that don't tie to a change. If a cause doesn't generate an action, omit it.

中文版

何时使用

  • 最近 30 天内没按计划走:故障、发布失败、错过 deadline、客户大量流失、版本翻车。
  • 用户是唯一负责人——没团队可访谈。
  • 目标是一两个具体改动,不是 10 页大报告。

何时不使用

  • 故障还在进行中——先修,再复盘。
  • 用户想情绪处理而非复盘——建议散步或找朋友聊,这个技能是冷分析。
  • 只是常规小坎坷(一个线上 bug,一周不顺)——不是每次都需要复盘。

结构

五段,严格字数限制

  1. 发生了什么——3 条事实,不带分析。(≤ 60 字)
  2. 影响——损失了什么?(≤ 40 字)
  3. 时间线——触发→发现→响应→修复,带时间戳。(≤ 100 字)
  4. 根本原因——5 Why,但停在系统性原因(流程/默认/缺工具),不是"我累了"。(≤ 80 字)
  5. 改动——最多两条,要有 owner(你)、截止日、验证方式。(≤ 60 字)

总长 ≤ 350 字。再多就是在逃避决定。

"不归罪但要负责"原则

单兵复盘有个陷阱:唯一可能犯错的就是你,所以容易要么(a)自责,要么(b)甩锅外部。两者都没用。

正确框架: "我能改变什么系统/习惯/默认值,让下个月的我即使累了走神也不会再犯?"

这逼着方案朝流程、自动化、告警走——而不是"下次更小心"。

反模式

  • "下次更小心"不是改动,是愿望。换成系统性改动。
  • 一次复盘超过 2 条改动——你做不完。砍到两条。
  • 列出不对应改动的根因——如果一个根因不会触发行动,删掉。

Repositorio GitHub

rockscy/solo-skills
Ruta: skills/postmortem-solo
0
ai-agentsawesome-listbilingualclaude-codeclaude-skillsdeveloper-tools

Habilidades relacionadas

llamaguard

Otro

LlamaGuard es el modelo de Meta de 7-8B parámetros para moderar las entradas y salidas de LLM en seis categorías de seguridad como violencia y discurso de odio. Ofrece una precisión del 94-95% y puede implementarse usando vLLM, Hugging Face o Amazon SageMaker. Utiliza esta skill para integrar fácilmente filtrado de contenido y barreras de seguridad en tus aplicaciones de IA.

Ver habilidad

cost-optimization

Otro

Esta Skill de Claude ayuda a los desarrolladores a optimizar los costes en la nube mediante el ajuste de tamaño de recursos, estrategias de etiquetado y análisis de gastos. Proporciona un marco para reducir los gastos en la nube e implementar una gobernanza de costes en AWS, Azure y GCP. Úsala cuando necesites analizar los costes de infraestructura, ajustar el tamaño de los recursos o cumplir con restricciones presupuestarias.

Ver habilidad

quantizing-models-bitsandbytes

Otro

Esta habilidad cuantiza LLMs a precisión de 8 o 4 bits utilizando bitsandbytes, logrando una reducción de memoria del 50-75% con pérdida mínima de precisión. Es ideal para ejecutar modelos más grandes en memoria GPU limitada o para acelerar la inferencia, admitiendo formatos como INT8, NF4 y FP4. La habilidad se integra con HuggingFace Transformers y permite entrenamiento QLoRA y optimizadores de 8 bits.

Ver habilidad

dispatching-parallel-agents

Otro

Esta Skill de Claude despliega múltiples agentes para investigar y solucionar 3 o más problemas independientes de forma concurrente. Está diseñada para escenarios que involucran fallos no relacionados que pueden resolverse sin estado compartido o dependencias. Su capacidad principal es la resolución paralela de problemas, asignando un agente por cada dominio problemático independiente para maximizar la eficiencia.

Ver habilidad