evaluate-agent-framework
について
このスキルは、コミュニティの健全性、代替リスク、アーキテクチャ、ガバナンスを分析することで、オープンソースAIエージェントフレームワークの投資適格性を評価します。エンジニアリングリソースの配分を導くため、4段階の分類(INVEST、EVALUATE-FURTHER、CONTRIBUTE-CAUTIOUSLY、AVOID)を出力します。大規模な開発リソースを投入する前に、フレームワークの長期的な持続可能性を評価するためにご利用ください。
クイックインストール
Claude Code
推奨npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-frameworkこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Evaluate Agent Framework
Score OSS agent framework → invest? Steps 2-3 novel: survival rate + supersession. Tier → INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID. Calibrate effort pre-commit.
Use When
- Adopt framework prod? → check
- Dep risk on framework → assess
- Send eng effort to ext proj? → decide
- Build-vs-adopt cmp → rank
- Post-release / post-gov-change / post-acq re-eval
In
- Req:
framework_url— GitHub repo URL - Opt:
comparison_frameworks— alt framework URLs, benchuse_case— intended use (e.g., "multi-agent orchestration", "tool-use pipelines") → arch fitcontribution_budget— planned eng hrs → tier calib
Do
Step 1: Census
Size, activity, landscape → before deeper probe.
- Read
README.md,CONTRIBUTING.md,LICENSE, arch docs (docs/,ARCHITECTURE.md) - Quant metrics:
- Stars/forks/issues/PRs →
gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests - Dependents → GitHub "Used by" or
gh api repos/<owner>/<repo>/dependents - Release cadence →
gh release list --limit 10— freq + semver?
- Stars/forks/issues/PRs →
- Bus factor → top 5 contribs last 12mo by commit. Top >60% → crit low
- Landscape:
- Pioneer: first mover → defines cat (high infl, high supersession risk to followers)
- Fast-follower: <6mo post-pioneer → iterate
- Late entrant: post-stabilization → cmp on feat/gov
comparison_frameworksgiven → same metrics each alt
→ Census tbl: stars, forks, deps, cadence, bus factor, landscape (+cmps).
If err: private/rate-limited → manual README. No metrics (self-hosted GitLab) → note gap, qual only.
Step 2: Community Health
Welcome/support/retain externals?
- External survival rate:
- Last 50 closed PRs →
gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels - Author internal (org) vs external
survival_rate = merged_external_PRs / total_external_PRs- Healthy >50%; concern <30%
- Last 50 closed PRs →
- Responsiveness:
- Issue first-response: median issue-open → first maintainer comment
- PR merge latency: median ext PR open → merge
- Healthy <7d resp, <30d merge; concern >30d resp
- Contributor diversity:
- Ext/int ratio last 6mo
- Unique externals w/ >=2 merged PRs (repeat → healthy eco)
- Gov artifacts:
CONTRIBUTING.mdexists + actionable (not just "submit a PR")CODE_OF_CONDUCT.mdexists- Gov docs → decision process
- Issue/PR templates guide contribs
→ Scorecard: survival, resp times, diversity, gov checklist.
If err: PR data thin (<20 closed) → note sample, weight others. Non-GitHub → adapt queries to platform API.
Step 3: Supersession Risk
Ext contribs → obsoleted by internal dev? Biggest risk.
- Sample last 50-100 merged ext PRs (or all if fewer)
- Each merged ext PR, later:
- Reverted: explicit revert ref PR
- Rewritten: same file/module changed <90d by internal
- Obsoleted: feat removed/replaced next release
supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external- Roadmap vs ext-active areas:
- High overlap → high supersession (int builds over ext)
- Low overlap → lower risk (ext fill gaps int won't)
- "Contrib traps": look friendly, scheduled for int rewrite
- Bench: NemoClaw → 71% ext PRs superseded <6mo. Calib pt.
→ Supersession % + breakdown (reverted/rewritten/obsoleted). Roadmap overlap.
If err: shallow/squash-merged (attrib lost) → est by ext PR paths vs files changed next releases. Lower confidence.
Step 4: Architecture Alignment
Arch supports use case w/o lock-in?
- Extension pts:
- Plugin API → documented?
- Config surface → customize no-fork?
- Hook/callback → intercept behavior?
- Lock-in:
- Rewrite cost: migrate-away est (d/wk/mo)
- Data portability: export std fmt?
- Std compliance: agentskills.io, MCP, A2A vs proprietary?
- API stability:
- Breaking changes/major (CHANGELOG, migration guides)
- Deprecation policy (advance warn)
- Semver compliance (breaking → major only)
- Use case fit:
use_casegiven → arch natural fit?- Arch mismatches → workarounds req?
- Interop:
- agentskills.io compat (skill model)
- MCP (tool integration)
- A2A (agent-to-agent)
→ Arch report: ext pts, lock-in (low/med/high), API stability, use-case fit.
If err: sparse docs → derive from code + public API. Too young for stability hist → note, weight gov more.
Step 5: Governance + Sustainability
Gov model → long-term viable? Fair to externals?
- Gov model:
- BDFL: single decider → fast, bus factor risk
- Committee/Core team: distributed → slower, resilient
- Foundation-backed: Apache, Linux Foundation, CNCF → most sustainable
- Corporate-controlled: one co → rug-pull risk
- Funding:
- VC, corp, grants, community, unfunded
- Full-time maintainers >=2 healthy; 0 red flag
- Revenue → how sustain?
- Contributor protections:
- License: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA → rights transfer that disadvantage?
- Recog → credited in releases/changelogs/docs?
- Security:
SECURITY.mdor equiv- Median CVE → patch time
- Dep update (Dependabot, Renovate, manual)
- Trajectory:
- Gov evolving (→ foundation)?
- Recent leadership/acq/relicense?
- Public maintainer-contributor conflicts?
→ Gov assess: model, sustainability (sustainable/at-risk/critical), protections, security.
If err: gov undocumented → absence = yellow flag. Check implicit: who merges, who closes, who releases.
Step 6: Classify
Synth → 4-tier + justifications + recs.
- Score each (1-5):
- Community health: survival, resp, diversity
- Supersession risk: rate, roadmap, traps (invert: low better)
- Arch alignment: ext pts, lock-in, stability, fit
- Gov sustainability: model, funding, protections, sec
- Thresholds:
- INVEST (all >=4): healthy, low supersession (<20%), aligned, sustainable gov → safe adopt + contrib
- EVALUATE-FURTHER (mixed, none <2): mixed signals → specific follow-ups, re-eval date
- CONTRIBUTE-CAUTIOUSLY (any 2, none <2): high supersession (>40%) or gov concerns → limit to requested work, maintainer-approved scope, plugin/ext decoupled from core
- AVOID (any 1): crit red flags — abandoned, hostile (<15% survival), bad license, rug-pull → no eng effort
- Write report:
- Tier + 1-sentence rationale up front
- Each dim score + evidence
contribution_budgetgiven → how alloc hrs per tier- EVALUATE-FURTHER → specific Qs + timeline
- CONTRIBUTE-CAUTIOUSLY → safe (plugins, docs, tests) vs risky (core)
comparison_frameworksevaluated → cmp matrix, rank all
→ Classification report: tier, scores, evidence, actionable recs.
If err: data gaps block confident call → default EVALUATE-FURTHER, doc missing data + how to get. Never default INVEST when unsure.
Chk
- Census: stars, forks, deps, cadence, bus factor, landscape
- Community: survival, resp times, diversity, gov artifacts
- Supersession: rate + breakdown (reverted/rewritten/obsoleted)
- Arch: ext pts, lock-in, API stability, fit
- Gov: model, funding, protections, security
- Tier: INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
- Each score → specific evidence
- Recs actionable + calib to budget (if given)
- Data gaps + confidence limits doc'd
Traps
- Popularity ≠ health: 50k stars + 1 maintainer < 2k stars + 15 active contribs. SPoF.
- Skip supersession: most common ext-contrib failure. Welcoming community worthless if int overwrites ext.
- Arch-only, ignore gov: pretty design fails w/ unsustainable or hostile gov.
- EVALUATE-FURTHER ≠ AVOID: mixed = investigate, not reject. Set re-eval date + specific Qs.
- Snapshot bias: metrics point-in-time. Declining proj w/ great current > improving proj w/ mediocre. Check 6-12mo trend.
- CLA complacency: some CLAs transfer copyright → your work = their asset. Read text, not checkbox.
- Single-framework anchor: no cmp → anything looks great/terrible. Bench at least 1 alt, even informal.
See
- polish-claw-project — contrib workflow this informs
- review-software-architecture — Step 4 arch eval
- forage-solutions — alt framework discovery for cmp
- search-prior-art — landscape + prior work
- security-audit-codebase — Step 5 sec posture
- assess-ip-landscape — license + IP risk
GitHub リポジトリ
関連スキル
executing-plans
デザインexecuting-plansスキルは、完全な実装計画があり、それを管理されたバッチでレビューチェックポイントを設けながら実行する場合に使用します。このスキルは計画を読み込んで批判的にレビューした後、小さなバッチ(デフォルトは3タスク)でタスクを実行し、各バッチの間に進捗状況を報告してアーキテクトのレビューを受けます。これにより、品質管理チェックポイントが組み込まれた体系的な実装が保証されます。
requesting-code-review
デザインこのスキルは、コードレビュアーサブエージェントを起動し、処理を進める前に要件に対してコード変更を分析します。タスク完了後、主要な機能の実装後、またはmainブランチへのマージ前などに使用すべきです。このレビューは、現在の実装と元の計画を比較することで、問題を早期に発見するのに役立ちます。
connect-mcp-server
デザインこのスキルは、開発者がHTTP、stdio、またはSSEトランスポートを使用してMCPサーバーをClaude Codeに接続するための包括的なガイドを提供します。GitHub、Notion、カスタムAPIなどの外部サービスを統合するためのインストール、設定、認証、セキュリティについて解説しています。MCP統合のセットアップ、外部ツールの設定、またはClaudeのModel Context Protocolを扱う際にご利用ください。
web-cli-teleport
デザインこのスキルは、タスク分析に基づいて開発者がClaude Code WebとCLIインターフェースの選択を支援し、これらの環境間でのシームレスなセッションテレポーテーションを可能にします。Web、CLI、モバイル環境を切り替える際のセッション状態とコンテキストを管理することで、ワークフローを最適化します。様々な段階で異なるツールを必要とする複雑なプロジェクトにご活用ください。
