evaluate-agent-framework
Über
Diese Fähigkeit bewertet Open-Source-KI-Agenten-Frameworks auf Investitionsreife, indem sie Community-Gesundheit, Ablösungsrisiko, Architektur und Governance analysiert. Sie gibt eine vierstufige Klassifizierung (INVESTIEREN, WEITER-PRÜFEN, VORSICHTIG-BEISTEUERN, MEIDEN) aus, um die Zuteilung von Engineering-Ressourcen zu steuern. Nutzen Sie sie, um die langfristige Tragfähigkeit eines Frameworks zu bewerten, bevor Sie erhebliche Entwicklungsressourcen binden.
Schnellinstallation
Claude Code
Empfohlennpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-frameworkKopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren
Dokumentation
Evaluate Agent Framework
Score OSS agent framework → invest? Steps 2-3 novel: survival rate + supersession. Tier → INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID. Calibrate effort pre-commit.
Use When
- Adopt framework prod? → check
- Dep risk on framework → assess
- Send eng effort to ext proj? → decide
- Build-vs-adopt cmp → rank
- Post-release / post-gov-change / post-acq re-eval
In
- Req:
framework_url— GitHub repo URL - Opt:
comparison_frameworks— alt framework URLs, benchuse_case— intended use (e.g., "multi-agent orchestration", "tool-use pipelines") → arch fitcontribution_budget— planned eng hrs → tier calib
Do
Step 1: Census
Size, activity, landscape → before deeper probe.
- Read
README.md,CONTRIBUTING.md,LICENSE, arch docs (docs/,ARCHITECTURE.md) - Quant metrics:
- Stars/forks/issues/PRs →
gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests - Dependents → GitHub "Used by" or
gh api repos/<owner>/<repo>/dependents - Release cadence →
gh release list --limit 10— freq + semver?
- Stars/forks/issues/PRs →
- Bus factor → top 5 contribs last 12mo by commit. Top >60% → crit low
- Landscape:
- Pioneer: first mover → defines cat (high infl, high supersession risk to followers)
- Fast-follower: <6mo post-pioneer → iterate
- Late entrant: post-stabilization → cmp on feat/gov
comparison_frameworksgiven → same metrics each alt
→ Census tbl: stars, forks, deps, cadence, bus factor, landscape (+cmps).
If err: private/rate-limited → manual README. No metrics (self-hosted GitLab) → note gap, qual only.
Step 2: Community Health
Welcome/support/retain externals?
- External survival rate:
- Last 50 closed PRs →
gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels - Author internal (org) vs external
survival_rate = merged_external_PRs / total_external_PRs- Healthy >50%; concern <30%
- Last 50 closed PRs →
- Responsiveness:
- Issue first-response: median issue-open → first maintainer comment
- PR merge latency: median ext PR open → merge
- Healthy <7d resp, <30d merge; concern >30d resp
- Contributor diversity:
- Ext/int ratio last 6mo
- Unique externals w/ >=2 merged PRs (repeat → healthy eco)
- Gov artifacts:
CONTRIBUTING.mdexists + actionable (not just "submit a PR")CODE_OF_CONDUCT.mdexists- Gov docs → decision process
- Issue/PR templates guide contribs
→ Scorecard: survival, resp times, diversity, gov checklist.
If err: PR data thin (<20 closed) → note sample, weight others. Non-GitHub → adapt queries to platform API.
Step 3: Supersession Risk
Ext contribs → obsoleted by internal dev? Biggest risk.
- Sample last 50-100 merged ext PRs (or all if fewer)
- Each merged ext PR, later:
- Reverted: explicit revert ref PR
- Rewritten: same file/module changed <90d by internal
- Obsoleted: feat removed/replaced next release
supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external- Roadmap vs ext-active areas:
- High overlap → high supersession (int builds over ext)
- Low overlap → lower risk (ext fill gaps int won't)
- "Contrib traps": look friendly, scheduled for int rewrite
- Bench: NemoClaw → 71% ext PRs superseded <6mo. Calib pt.
→ Supersession % + breakdown (reverted/rewritten/obsoleted). Roadmap overlap.
If err: shallow/squash-merged (attrib lost) → est by ext PR paths vs files changed next releases. Lower confidence.
Step 4: Architecture Alignment
Arch supports use case w/o lock-in?
- Extension pts:
- Plugin API → documented?
- Config surface → customize no-fork?
- Hook/callback → intercept behavior?
- Lock-in:
- Rewrite cost: migrate-away est (d/wk/mo)
- Data portability: export std fmt?
- Std compliance: agentskills.io, MCP, A2A vs proprietary?
- API stability:
- Breaking changes/major (CHANGELOG, migration guides)
- Deprecation policy (advance warn)
- Semver compliance (breaking → major only)
- Use case fit:
use_casegiven → arch natural fit?- Arch mismatches → workarounds req?
- Interop:
- agentskills.io compat (skill model)
- MCP (tool integration)
- A2A (agent-to-agent)
→ Arch report: ext pts, lock-in (low/med/high), API stability, use-case fit.
If err: sparse docs → derive from code + public API. Too young for stability hist → note, weight gov more.
Step 5: Governance + Sustainability
Gov model → long-term viable? Fair to externals?
- Gov model:
- BDFL: single decider → fast, bus factor risk
- Committee/Core team: distributed → slower, resilient
- Foundation-backed: Apache, Linux Foundation, CNCF → most sustainable
- Corporate-controlled: one co → rug-pull risk
- Funding:
- VC, corp, grants, community, unfunded
- Full-time maintainers >=2 healthy; 0 red flag
- Revenue → how sustain?
- Contributor protections:
- License: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA → rights transfer that disadvantage?
- Recog → credited in releases/changelogs/docs?
- Security:
SECURITY.mdor equiv- Median CVE → patch time
- Dep update (Dependabot, Renovate, manual)
- Trajectory:
- Gov evolving (→ foundation)?
- Recent leadership/acq/relicense?
- Public maintainer-contributor conflicts?
→ Gov assess: model, sustainability (sustainable/at-risk/critical), protections, security.
If err: gov undocumented → absence = yellow flag. Check implicit: who merges, who closes, who releases.
Step 6: Classify
Synth → 4-tier + justifications + recs.
- Score each (1-5):
- Community health: survival, resp, diversity
- Supersession risk: rate, roadmap, traps (invert: low better)
- Arch alignment: ext pts, lock-in, stability, fit
- Gov sustainability: model, funding, protections, sec
- Thresholds:
- INVEST (all >=4): healthy, low supersession (<20%), aligned, sustainable gov → safe adopt + contrib
- EVALUATE-FURTHER (mixed, none <2): mixed signals → specific follow-ups, re-eval date
- CONTRIBUTE-CAUTIOUSLY (any 2, none <2): high supersession (>40%) or gov concerns → limit to requested work, maintainer-approved scope, plugin/ext decoupled from core
- AVOID (any 1): crit red flags — abandoned, hostile (<15% survival), bad license, rug-pull → no eng effort
- Write report:
- Tier + 1-sentence rationale up front
- Each dim score + evidence
contribution_budgetgiven → how alloc hrs per tier- EVALUATE-FURTHER → specific Qs + timeline
- CONTRIBUTE-CAUTIOUSLY → safe (plugins, docs, tests) vs risky (core)
comparison_frameworksevaluated → cmp matrix, rank all
→ Classification report: tier, scores, evidence, actionable recs.
If err: data gaps block confident call → default EVALUATE-FURTHER, doc missing data + how to get. Never default INVEST when unsure.
Chk
- Census: stars, forks, deps, cadence, bus factor, landscape
- Community: survival, resp times, diversity, gov artifacts
- Supersession: rate + breakdown (reverted/rewritten/obsoleted)
- Arch: ext pts, lock-in, API stability, fit
- Gov: model, funding, protections, security
- Tier: INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
- Each score → specific evidence
- Recs actionable + calib to budget (if given)
- Data gaps + confidence limits doc'd
Traps
- Popularity ≠ health: 50k stars + 1 maintainer < 2k stars + 15 active contribs. SPoF.
- Skip supersession: most common ext-contrib failure. Welcoming community worthless if int overwrites ext.
- Arch-only, ignore gov: pretty design fails w/ unsustainable or hostile gov.
- EVALUATE-FURTHER ≠ AVOID: mixed = investigate, not reject. Set re-eval date + specific Qs.
- Snapshot bias: metrics point-in-time. Declining proj w/ great current > improving proj w/ mediocre. Check 6-12mo trend.
- CLA complacency: some CLAs transfer copyright → your work = their asset. Read text, not checkbox.
- Single-framework anchor: no cmp → anything looks great/terrible. Bench at least 1 alt, even informal.
See
- polish-claw-project — contrib workflow this informs
- review-software-architecture — Step 4 arch eval
- forage-solutions — alt framework discovery for cmp
- search-prior-art — landscape + prior work
- security-audit-codebase — Step 5 sec posture
- assess-ip-landscape — license + IP risk
GitHub Repository
Verwandte Skills
executing-plans
DesignVerwenden Sie die Fähigkeit "executing-plans", wenn Sie einen vollständigen Implementierungsplan zur Ausführung in kontrollierten Batches mit Überprüfungspunkten vorliegen haben. Sie lädt den Plan und überprüft ihn kritisch, führt dann Aufgaben in kleinen Batches (standardmäßig 3 Aufgaben) aus und meldet den Fortschritt zwischen jedem Batch zur Überprüfung durch den Architekten. Dies gewährleistet eine systematische Implementierung mit integrierten Qualitätskontrollpunkten.
requesting-code-review
DesignDiese Fähigkeit sendet einen Unteragenten für Code-Review, um Codeänderungen anhand der Anforderungen zu analysieren, bevor fortgefahren wird. Sie sollte nach dem Abschließen von Aufgaben, der Implementierung größerer Funktionen oder vor dem Zusammenführen in den Hauptzweig verwendet werden. Die Überprüfung hilft dabei, Probleme frühzeitig zu erkennen, indem die aktuelle Implementierung mit dem ursprünglichen Plan verglichen wird.
connect-mcp-server
DesignDiese Fähigkeit bietet Entwicklern eine umfassende Anleitung, um MCP-Server über HTTP-, stdio- oder SSE-Transports mit Claude Code zu verbinden. Sie behandelt Installation, Konfiguration, Authentifizierung und Sicherheit für die Integration externer Dienste wie GitHub, Notion und benutzerdefinierter APIs. Nutzen Sie sie beim Einrichten von MCP-Integrationen, bei der Konfiguration externer Tools oder bei der Arbeit mit Claude's Model Context Protocol.
web-cli-teleport
DesignDiese Fähigkeit unterstützt Entwickler bei der Wahl zwischen Claude Code Web- und CLI-Schnittstellen basierend auf Aufgabenanalysen und ermöglicht nahtloses Session-Teleporting zwischen diesen Umgebungen. Sie optimiert den Workflow, indem sie den Sitzungsstatus und Kontext beim Wechsel zwischen Web, CLI oder Mobilgeräten verwaltet. Nutzen Sie sie für komplexe Projekte, die in verschiedenen Phasen unterschiedliche Werkzeuge erfordern.
