返回技能列表

well-architected

avelikiy
更新于 2 days ago
3 次查看
30
6
30
在 GitHub 上查看
设计excelwordaidesign

关于

This Claude Skill enforces a comprehensive architecture review across six pillars (operational excellence, security, reliability, performance, cost, sustainability) for all non-nano projects. It's automatically applied by the architect agent when creating or auditing ARCH documents to ensure thorough design consideration beyond just features. Developers should use it for small-to-enterprise project architectures, audits, and brownfield reviews, but not for nano projects or simple bug fixes.

快速安装

Claude Code

推荐
主要方式
npx skills add avelikiy/great_cto -a claude-code
插件命令备选方式
/plugin add https://github.com/avelikiy/great_cto
Git 克隆备选方式
git clone https://github.com/avelikiy/great_cto.git ~/.claude/skills/well-architected

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Well-Architected — 6 pillars to verify before shipping

Every ARCH document for non-nano work must answer the 6 pillar questions below. Skipping a pillar is allowed only if explicitly justified (e.g. "Sustainability: N/A — backend-only, runs in shared infra.").

This is adapted from AWS Well-Architected (lens: small-team SaaS / LLM applications), trimmed to questions that matter at <10 engineer scale.

Pillar 1 — Operational excellence

Questions

  1. Observability: What metrics, logs, traces do we emit? How do we tell from a dashboard if this is working in prod?
  2. Deployability: How do we ship a change? CI gates? Rollback path?
  3. Runbooks: When this breaks at 3am, what does on-call read?

Pass criteria

  • ✅ One metric per business outcome (e.g. webhook-deliveries-acked)
  • ✅ One log line per request, with request-id correlatable across services
  • ✅ Deploy path is documented and tested (rollback dry-run executed)
  • ✅ Runbook covers top-3 failure modes from pre-mortem

Common fail

❌ "We'll add monitoring later." Monitoring is part of the feature.

Pillar 2 — Security

Questions

  1. Trust boundaries: Where does untrusted data enter? How is it validated/sanitized?
  2. Authn / authz: Who can call this? Who can read/write the data?
  3. Secrets: Where are API keys, DB passwords, JWT signing keys stored?
  4. Data classification: PII? PHI? PCI cardholder data? What's the retention policy?

Pass criteria

  • ✅ Every external input has explicit validation at the boundary
  • ✅ Authz is enforced at the data layer, not just UI
  • ✅ Secrets in env vars or secret manager, never in source
  • ✅ Sensitive data classified and retention policy defined

Common fail

❌ "JWT validates the user, that's our authz." JWT is authentication. Authorization is separate (this user can read THIS row).

Pillar 3 — Reliability

Questions

  1. Failure modes: What happens when a downstream dependency is slow / down / corrupted?
  2. Idempotency: Can a retried request safely re-execute?
  3. Backups & recovery: What's the RPO (data-loss tolerance)? RTO (downtime tolerance)? Test plan for both?
  4. Capacity: What's the max QPS this can handle? What happens at 1.5x that?

Pass criteria

  • ✅ Circuit breakers / timeouts on external calls
  • ✅ State-mutating endpoints accept idempotency keys
  • ✅ Backups documented + restore tested in the last 90 days
  • ✅ Load test exists; results in docs/perf/

Common fail

❌ "Postgres has backups." Backups without a tested restore aren't backups.

Pillar 4 — Performance efficiency

Questions

  1. SLOs: What's the p50/p95/p99 latency target? Error rate? Availability?
  2. Bottlenecks: Profile the critical path — what's the slowest step?
  3. Caching: What's cacheable? Cache invalidation strategy?
  4. Scaling: Vertical or horizontal? Auto-scale rules?

Pass criteria

  • ✅ SLO numbers in the ARCH doc (not "fast enough")
  • ✅ Profile attached for non-trivial requests
  • ✅ Cache strategy documented; invalidation explicit
  • ✅ Scaling decision justified by data, not "feels right"

Common fail

❌ "Database can handle it." Quantify: queries/sec, row count, index hit rate.

Pillar 5 — Cost optimization

Questions

  1. Hot path: What's the most expensive operation per request? Why?
  2. Right-sizing: Is the chosen instance type / model / DB tier the smallest one that meets SLO?
  3. Cleanup: What happens to old data? Old logs? Old branch environments?

Pass criteria

  • ✅ Use skill cost-model to document explicit $ numbers
  • ✅ Choose smallest LLM model that meets quality SLO (haiku before sonnet, sonnet before opus)
  • ✅ Retention policy for logs, metrics, old data

Common fail

❌ Defaulting to Opus / GPT-4 when Haiku would work. Test on Haiku first.

Pillar 6 — Sustainability (env / energy)

Questions

  1. Workload efficiency: Is the code O(n log n) when it could be O(n)?
  2. Idle resources: Can dev environments scale to zero overnight?
  3. Data minimization: Do we collect / store data we never query?

Pass criteria

  • ✅ Hot loop complexity documented
  • ✅ Non-prod resources have shutdown schedules
  • ✅ Data lifecycle covers ingestion, retention, deletion

Common fail

❌ Logs at debug level in prod, never reviewed. Waste of storage + carbon.

Output format — add to ARCH

## Well-Architected review

### 1. Operational excellence
- Metrics: <list>
- Deploy path: <link to runbook>
- Verdict: PASS | RISKS LISTED

### 2. Security
- Trust boundaries: <list>
- Data classification: <PII / PHI / PCI / none>
- Verdict: PASS | RISKS LISTED

### 3. Reliability
- Failure modes: <link to pre-mortem>
- Idempotency: <yes/no per endpoint>
- Verdict: PASS | RISKS LISTED

### 4. Performance
- SLOs: p99=<ms>, error_rate=<%>, availability=<%>
- Verdict: PASS | RISKS LISTED

### 5. Cost
- Per-request cost: $<amount>
- Verdict: PASS | RISKS LISTED

### 6. Sustainability
- Hot-path complexity: O(<n>)
- Verdict: PASS | N/A | RISKS LISTED

## Open risks (rolled up)

<bullet list of all RISKS LISTED items + mitigation in plan>

When PASS is acceptable with risks listed

Not every architecture is bulletproof. PASS-with-risks is OK if:

  • Each risk is explicit (not hand-waved)
  • Each risk has either a mitigation in the plan OR explicit acceptance by the user
  • The pre-mortem section addresses the top-3 risk-score items

Gate:plan can approve a PASS-with-risks; gate:ship needs the mitigations shipped.

GitHub 仓库

avelikiy/great_cto
路径: skills/well-architected
0
agentic-codingclaude-code-pluginclaude-code-skillsclaude-code-subagentscode-reviewcto

相关推荐技能

executing-plans

设计

该Skill用于当开发者提供完整实施计划时,以受控批次方式执行代码实现。它会先审阅计划并提出疑问,然后分批次执行任务(默认每批3个任务),并在批次间暂停等待审查。关键特性包括分批次执行、内置检查点和架构师审查机制,确保复杂系统实现的可控性。

查看技能

requesting-code-review

设计

该Skill可在完成任务、实现主要功能或合并代码前自动调度代码审查子代理,确保实现符合需求和计划。它支持通过指定git SHA范围进行精准的代码变更审查,帮助开发者在关键节点及时发现潜在问题。核心原则是"早审查、勤审查",适用于开发流程的各个关键阶段。

查看技能

connect-mcp-server

设计

这个Skill指导开发者如何将MCP服务器连接到Claude Code,支持HTTP、stdio和SSE三种传输协议。它涵盖了从安装配置到认证安全的完整流程,适用于集成GitHub、Notion、数据库等外部服务。当开发者需要添加集成、配置外部工具或提及MCP相关功能时,这个Skill能提供实用的操作指南。

查看技能

web-cli-teleport

设计

该Skill帮助开发者根据任务特性选择Claude Code的Web或CLI界面,并指导如何在两种环境间无缝迁移会话。它能分析任务复杂度、迭代需求等要素,推荐最优工作界面和工作流。关键特性包括会话状态管理、环境切换指导和上下文优化建议。

查看技能