evaluate-agent-framework
О программе
Этот навык оценивает фреймворки для ИИ-агентов с открытым исходным кодом на предмет готовности к инвестициям, анализируя состояние сообщества, риск устаревания, архитектуру и управление проектом. Он выдает четырехуровневую классификацию (INVEST, EVALUATE-FURTHER, CONTRIBUTE-CAUTIOUSLY, AVOID) для распределения инженерных ресурсов. Используйте его, чтобы оценить долгосрочную жизнеспособность фреймворка до выделения значительных усилий на разработку.
Быстрая установка
Claude Code
Рекомендуетсяnpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-frameworkСкопируйте и вставьте эту команду в Claude Code для установки этого навыка
Документация
Evaluate Agent Framework
Score OSS agent framework → invest? Steps 2-3 novel: survival rate + supersession. Tier → INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID. Calibrate effort pre-commit.
Use When
- Adopt framework prod? → check
- Dep risk on framework → assess
- Send eng effort to ext proj? → decide
- Build-vs-adopt cmp → rank
- Post-release / post-gov-change / post-acq re-eval
In
- Req:
framework_url— GitHub repo URL - Opt:
comparison_frameworks— alt framework URLs, benchuse_case— intended use (e.g., "multi-agent orchestration", "tool-use pipelines") → arch fitcontribution_budget— planned eng hrs → tier calib
Do
Step 1: Census
Size, activity, landscape → before deeper probe.
- Read
README.md,CONTRIBUTING.md,LICENSE, arch docs (docs/,ARCHITECTURE.md) - Quant metrics:
- Stars/forks/issues/PRs →
gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests - Dependents → GitHub "Used by" or
gh api repos/<owner>/<repo>/dependents - Release cadence →
gh release list --limit 10— freq + semver?
- Stars/forks/issues/PRs →
- Bus factor → top 5 contribs last 12mo by commit. Top >60% → crit low
- Landscape:
- Pioneer: first mover → defines cat (high infl, high supersession risk to followers)
- Fast-follower: <6mo post-pioneer → iterate
- Late entrant: post-stabilization → cmp on feat/gov
comparison_frameworksgiven → same metrics each alt
→ Census tbl: stars, forks, deps, cadence, bus factor, landscape (+cmps).
If err: private/rate-limited → manual README. No metrics (self-hosted GitLab) → note gap, qual only.
Step 2: Community Health
Welcome/support/retain externals?
- External survival rate:
- Last 50 closed PRs →
gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels - Author internal (org) vs external
survival_rate = merged_external_PRs / total_external_PRs- Healthy >50%; concern <30%
- Last 50 closed PRs →
- Responsiveness:
- Issue first-response: median issue-open → first maintainer comment
- PR merge latency: median ext PR open → merge
- Healthy <7d resp, <30d merge; concern >30d resp
- Contributor diversity:
- Ext/int ratio last 6mo
- Unique externals w/ >=2 merged PRs (repeat → healthy eco)
- Gov artifacts:
CONTRIBUTING.mdexists + actionable (not just "submit a PR")CODE_OF_CONDUCT.mdexists- Gov docs → decision process
- Issue/PR templates guide contribs
→ Scorecard: survival, resp times, diversity, gov checklist.
If err: PR data thin (<20 closed) → note sample, weight others. Non-GitHub → adapt queries to platform API.
Step 3: Supersession Risk
Ext contribs → obsoleted by internal dev? Biggest risk.
- Sample last 50-100 merged ext PRs (or all if fewer)
- Each merged ext PR, later:
- Reverted: explicit revert ref PR
- Rewritten: same file/module changed <90d by internal
- Obsoleted: feat removed/replaced next release
supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external- Roadmap vs ext-active areas:
- High overlap → high supersession (int builds over ext)
- Low overlap → lower risk (ext fill gaps int won't)
- "Contrib traps": look friendly, scheduled for int rewrite
- Bench: NemoClaw → 71% ext PRs superseded <6mo. Calib pt.
→ Supersession % + breakdown (reverted/rewritten/obsoleted). Roadmap overlap.
If err: shallow/squash-merged (attrib lost) → est by ext PR paths vs files changed next releases. Lower confidence.
Step 4: Architecture Alignment
Arch supports use case w/o lock-in?
- Extension pts:
- Plugin API → documented?
- Config surface → customize no-fork?
- Hook/callback → intercept behavior?
- Lock-in:
- Rewrite cost: migrate-away est (d/wk/mo)
- Data portability: export std fmt?
- Std compliance: agentskills.io, MCP, A2A vs proprietary?
- API stability:
- Breaking changes/major (CHANGELOG, migration guides)
- Deprecation policy (advance warn)
- Semver compliance (breaking → major only)
- Use case fit:
use_casegiven → arch natural fit?- Arch mismatches → workarounds req?
- Interop:
- agentskills.io compat (skill model)
- MCP (tool integration)
- A2A (agent-to-agent)
→ Arch report: ext pts, lock-in (low/med/high), API stability, use-case fit.
If err: sparse docs → derive from code + public API. Too young for stability hist → note, weight gov more.
Step 5: Governance + Sustainability
Gov model → long-term viable? Fair to externals?
- Gov model:
- BDFL: single decider → fast, bus factor risk
- Committee/Core team: distributed → slower, resilient
- Foundation-backed: Apache, Linux Foundation, CNCF → most sustainable
- Corporate-controlled: one co → rug-pull risk
- Funding:
- VC, corp, grants, community, unfunded
- Full-time maintainers >=2 healthy; 0 red flag
- Revenue → how sustain?
- Contributor protections:
- License: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA → rights transfer that disadvantage?
- Recog → credited in releases/changelogs/docs?
- Security:
SECURITY.mdor equiv- Median CVE → patch time
- Dep update (Dependabot, Renovate, manual)
- Trajectory:
- Gov evolving (→ foundation)?
- Recent leadership/acq/relicense?
- Public maintainer-contributor conflicts?
→ Gov assess: model, sustainability (sustainable/at-risk/critical), protections, security.
If err: gov undocumented → absence = yellow flag. Check implicit: who merges, who closes, who releases.
Step 6: Classify
Synth → 4-tier + justifications + recs.
- Score each (1-5):
- Community health: survival, resp, diversity
- Supersession risk: rate, roadmap, traps (invert: low better)
- Arch alignment: ext pts, lock-in, stability, fit
- Gov sustainability: model, funding, protections, sec
- Thresholds:
- INVEST (all >=4): healthy, low supersession (<20%), aligned, sustainable gov → safe adopt + contrib
- EVALUATE-FURTHER (mixed, none <2): mixed signals → specific follow-ups, re-eval date
- CONTRIBUTE-CAUTIOUSLY (any 2, none <2): high supersession (>40%) or gov concerns → limit to requested work, maintainer-approved scope, plugin/ext decoupled from core
- AVOID (any 1): crit red flags — abandoned, hostile (<15% survival), bad license, rug-pull → no eng effort
- Write report:
- Tier + 1-sentence rationale up front
- Each dim score + evidence
contribution_budgetgiven → how alloc hrs per tier- EVALUATE-FURTHER → specific Qs + timeline
- CONTRIBUTE-CAUTIOUSLY → safe (plugins, docs, tests) vs risky (core)
comparison_frameworksevaluated → cmp matrix, rank all
→ Classification report: tier, scores, evidence, actionable recs.
If err: data gaps block confident call → default EVALUATE-FURTHER, doc missing data + how to get. Never default INVEST when unsure.
Chk
- Census: stars, forks, deps, cadence, bus factor, landscape
- Community: survival, resp times, diversity, gov artifacts
- Supersession: rate + breakdown (reverted/rewritten/obsoleted)
- Arch: ext pts, lock-in, API stability, fit
- Gov: model, funding, protections, security
- Tier: INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
- Each score → specific evidence
- Recs actionable + calib to budget (if given)
- Data gaps + confidence limits doc'd
Traps
- Popularity ≠ health: 50k stars + 1 maintainer < 2k stars + 15 active contribs. SPoF.
- Skip supersession: most common ext-contrib failure. Welcoming community worthless if int overwrites ext.
- Arch-only, ignore gov: pretty design fails w/ unsustainable or hostile gov.
- EVALUATE-FURTHER ≠ AVOID: mixed = investigate, not reject. Set re-eval date + specific Qs.
- Snapshot bias: metrics point-in-time. Declining proj w/ great current > improving proj w/ mediocre. Check 6-12mo trend.
- CLA complacency: some CLAs transfer copyright → your work = their asset. Read text, not checkbox.
- Single-framework anchor: no cmp → anything looks great/terrible. Bench at least 1 alt, even informal.
See
- polish-claw-project — contrib workflow this informs
- review-software-architecture — Step 4 arch eval
- forage-solutions — alt framework discovery for cmp
- search-prior-art — landscape + prior work
- security-audit-codebase — Step 5 sec posture
- assess-ip-landscape — license + IP risk
GitHub репозиторий
Похожие навыки
executing-plans
ДизайнИспользуйте навык executing-plans, когда у вас есть полный план реализации для выполнения контролируемыми партиями с контрольными точками проверки. Он загружает и критически анализирует план, затем выполняет задачи небольшими партиями (по умолчанию 3 задачи), сообщая о прогрессе между каждой партией для проверки архитектором. Это обеспечивает систематическую реализацию со встроенными контрольными точками проверки качества.
requesting-code-review
ДизайнЭтот навык запускает суб-агента для ревью кода, который анализирует изменения в коде на соответствие требованиям перед дальнейшими действиями. Его следует использовать после завершения задач, реализации крупных функций или перед слиянием с основной веткой. Ревью помогает выявить проблемы на ранней стадии, сравнивая текущую реализацию с исходным планом.
connect-mcp-server
ДизайнЭтот навык предоставляет разработчикам подробное руководство по подключению серверов MCP к Claude Code с использованием транспортов HTTP, stdio или SSE. Он охватывает установку, конфигурацию, аутентификацию и безопасность для интеграции внешних сервисов, таких как GitHub, Notion и пользовательские API. Используйте его при настройке интеграций MCP, конфигурации внешних инструментов или работе с Model Context Protocol от Claude.
web-cli-teleport
ДизайнЭтот навык помогает разработчикам выбирать между веб-интерфейсом Claude Code и CLI на основе анализа задачи, а также обеспечивает бесшовное перемещение сессий между этими средами. Он оптимизирует рабочий процесс, управляя состоянием и контекстом сессии при переключении между веб-интерфейсом, CLI или мобильным приложением. Используйте его для сложных проектов, требующих различных инструментов на разных этапах работы.
