SKILL·B0BFEC

evaluate-agent-framework

Name: evaluate-agent-framework
Author: pjt222

pjt222

Mis à jour 1 month ago

9 vues

Designaidesign

À propos

Cette compétence évalue les frameworks d'agents IA open-source pour déterminer s'ils méritent l'investissement de votre équipe. Elle analyse la santé de la communauté, l'architecture et les risques de gouvernance, en fournissant une recommandation claire : INVESTIR / ÉVALUER PLUS AVANT / CONTRIBUER AVEC PRUDENCE / ÉVITER. Utilisez-la avant d'engager des ressources d'ingénierie sur un nouveau framework pour prendre des décisions d'adoption fondées sur des données.

Installation rapide

Claude Code

Recommandé

Principal

npx skills add pjt222/agent-almanac -a claude-code

Commande PluginAlternatif

/plugin add https://github.com/pjt222/agent-almanac

Git CloneAlternatif

git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-framework

Copiez et collez cette commande dans Claude Code pour installer cette compétence

Documentation

Evaluate Agent Framework

Structured check of open-source agent framework invest-readiness. New value sits in Steps 2-3: count community health by contribution survival rate; measure supersession risk — biggest reason external engineering effort wastes. Final tier (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) sets resource spend before commit dev cycles.

When Use

Picking whether to adopt agent framework for prod
Measuring dep risk on framework project leans on
Deciding whether to give engineering effort to external project
Compare competing frameworks for build-vs-adopt pick
Re-check framework after big release, governance shift, or buyout

Inputs

Required: framework_url — GitHub URL of framework repo
Optional:
- comparison_frameworks — list of other framework URLs to benchmark
- use_case — planned use case for arch alignment check (e.g., "multi-agent orchestration", "tool-use pipelines")
- contribution_budget — planned engineering hours, for tier calibration

Steps

Step 1: Gather Framework Census

Grab base data on project size, activity, landscape place before deep dig.

Fetch and read README.md, CONTRIBUTING.md, LICENSE, and any arch docs (docs/, ARCHITECTURE.md)
Grab counts:
- Stars, forks, open issues, open PRs: gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests
- Dependent repos: check GitHub's "Used by" count or gh api repos/<owner>/<repo>/dependents
- Release cadence: gh release list --limit 10 — note how often and if releases follow semver
Count bus factor: find top 5 contributors by commit count over last 12 months. Top contributor do >60% of commits? Bus factor critically low
Map landscape place:
- Pioneer: first mover, defines category (high sway, high supersession risk to followers)
- Fast-follower: launched within 6 months of pioneer, iterating on concept
- Late entrant: arrived after category stable, competing on features or governance
If comparison_frameworks given, grab same counts for each

Got: Census table with stars, forks, dependents, release cadence, bus factor, landscape place for target (and compares if given).

If fail: Repo private or API-rate-limited? Fall back to manual README read. Counts not there (e.g., self-hosted GitLab)? Note gap and go with qualitative check.

Step 2: Assess Community Health

Count whether project welcomes, supports, keeps external contributors.

Count external contribution survival rate:
- Pull last 50 closed PRs: gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels
- Sort each PR author as internal (org member) or external
- Compute: survival_rate = merged_external_PRs / total_external_PRs
- Healthy threshold: >50% survival rate; concerning: <30%
Measure response:
- Issue first-response time: median from issue open to first maintainer comment
- PR merge lag: median from PR open to merge for external PRs
- Healthy: <7 days first-response, <30 days merge; concerning: >30 days first-response
Check contributor spread:
- External/internal contributor ratio over last 6 months
- Count unique external contributors with >=2 merged PRs (repeat contributors signal healthy ecosystem)
Check governance artifacts:
- CONTRIBUTING.md exists and is actionable (not just "submit a PR")
- CODE_OF_CONDUCT.md exists
- Governance docs describe decision process
- Issue/PR templates guide contributors

Got: Community health scorecard with survival rate, response times, spread ratio, governance artifact checklist.

If fail: PR data thin (new project with <20 closed PRs)? Note sample-size limit and weight other signals more. Project uses non-GitHub platform? Adapt queries to that platform API.

Step 3: Calculate Supersession Risk

Figure how likely external contributions get wiped by internal dev — single biggest risk for framework adopters and contributors.

Sample last 50-100 merged external PRs (or all if fewer)
For each merged external PR, check if contributed code was later:
- Reverted: explicit revert commit ref-ing PR
- Rewritten: same file/module big change within 90 days by internal contributor
- Obsoleted: feature removed or replaced in later release
Count: supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external
Map published roadmap (if out) against areas where external contributors active:
- High overlap = high supersession risk (internals will build over external work)
- Low overlap = lower supersession risk (externals fill gaps internals won't)
Check for "contribution traps": areas look contribution-friendly but scheduled for internal rewrite
Benchmark: NemoClaw study showed 71% external PRs superseded within 6 months — use as calibration point

Got: Supersession rate as percent, with breakdown by type (reverted/rewritten/obsoleted). Roadmap overlap check.

If fail: Commit history shallow or squash-merged (losing author info)? Estimate supersession by compare external PR file paths vs files changed in later releases. Note lower confidence.

Step 4: Evaluate Architecture Alignment

Check whether framework arch supports your use case with no heavy lock-in.

Map extension points:
- Plugin/extension API: does framework expose documented plugin interface?
- Config surface: can behavior be tuned without fork?
- Hook/callback system: can intercept and change framework behavior at key points?
Check lock-in risk:
- Rewrite cost: estimate engineering effort to move away (days/weeks/months)
- Data portability: can data/state export in standard formats?
- Standard compliance: does framework use open standards (agentskills.io, MCP, A2A) or custom protocols?
Check API stability:
- Count breaking changes per major release (CHANGELOG, migration guides)
- Check deprecation policy (heads-up before removal)
- Check semver (breaking changes only in major versions)
Check fit with your specific use case:
- If use_case given, check whether framework arch naturally supports it
- Spot any arch mismatch that would need workarounds
Check interop:
- agentskills.io compat (skill model fit)
- MCP support (tool integration)
- A2A protocol support (agent-to-agent talk)

Got: Architecture fit report with extension point list, lock-in risk rate (low/medium/high), API stability score, use-case fit check.

If fail: Arch docs thin? Derive check from code shape and public API surface. Framework too young for stability history? Note this and weight governance signals more.

Step 5: Assess Governance and Sustainability

Check whether project governance model supports long-term life and fair treat of external contributors.

Sort governance model:
- BDFL (Benevolent Dictator for Life): one decider — fast calls, bus factor risk
- Committee/Core team: spread decision — slower but tougher
- Foundation-backed: formal governance (Apache, Linux Foundation, CNCF) — most durable
- Corporate-controlled: one company drives dev — watch for rug-pull risk
Check funding and sustainability:
- Funding sources: VC-backed, corporate-sponsored, grants, community-funded, unfunded
- Full-time maintainer count: >=2 is healthy; 0 is red flag
- Revenue model (if any): how does project keep going?
Check contributor protections:
- License type: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA rules: does signing CLA shift rights in way that hurt contributors?
- Contributor credit: external contributors credited in releases, changelogs, docs?
Check security stance:
- Security disclosure policy (SECURITY.md or same)
- Median time from CVE disclose to patch release
- Dep update patterns (Dependabot, Renovate, manual)
Check trajectory:
- Governance model shifting (e.g., moving toward foundation)?
- Recent leadership change, buyout, or relicense?
- Public conflicts between maintainers and contributors?

Got: Governance check with model class, durability rate (durable/at-risk/critical), contributor protection check, security stance summary.

If fail: Governance info not logged? Take the absence itself as yellow flag. Check for hidden governance by who merges PRs, who closes issues, who makes release picks.

Step 6: Classify Investment Readiness

Fold all finds into four-tier sort with specific reasons and actionable advice.

Score each dimension (1-5 scale):
- Community health: survival rate, response, spread
- Supersession risk: rate, roadmap overlap, contribution traps (invert: lower is better)
- Architecture fit: extension points, lock-in, stability, use-case fit
- Governance durability: model, funding, protections, security
Apply tier thresholds:
- INVEST (all dimensions >=4): Healthy community, low supersession (<20%), fit arch, durable governance. Safe to adopt and give engineering effort.
- EVALUATE-FURTHER (mixed, no dimension <2): Mixed signals need specific follow-ups. Log what needs clarify and set re-eval date.
- CONTRIBUTE-CAUTIOUSLY (any dimension 2, none <2): High supersession (>40%) or governance worries. Limit contributions to explicit-requested work, maintainer-approved scope, or plugin/extension dev decoupled from core.
- AVOID (any dimension 1): Critical red flags — abandoned project, hostile to externals (survival rate <15%), incompatible license, or soon rug-pull signs. Do not give engineering effort.
Write tier report:
- Lead with tier and one-line reason
- Sum each dimension score with key evidence
- If contribution_budget given, advise how to split those hours given tier
- For EVALUATE-FURTHER, list specific questions that need answers and set timeline
- For CONTRIBUTE-CAUTIOUSLY, say which contribution types safe (plugins, docs, tests) vs risky (core features)
If comparison_frameworks checked, make compare matrix ranking all frameworks

Got: Tier report with label, dimension scores, evidence sum, actionable advice tuned to invest context.

If fail: Data gaps block confident sort? Default to EVALUATE-FURTHER with clear log of what data missing and how to get it. Never default to INVEST when unsure.

Validation

Census data grabbed: stars, forks, dependents, release cadence, bus factor, landscape place
Community health counted: survival rate, response times, contributor spread, governance artifacts
Supersession risk counted with breakdown by type (reverted/rewritten/obsoleted)
Architecture fit checked: extension points, lock-in risk, API stability, use-case fit
Governance checked: model, funding, contributor protections, security stance
Tier made: one of INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
Each dimension score backed with specific evidence from analysis
Advice actionable and tuned to contribution budget (if given)
Data gaps and confidence limits clearly logged

Pitfalls

Mix popularity with health: High stars but low contributor spread mean single fail point. 50k-star project with one maintainer is less healthy than 2k-star project with 15 active contributors.
Ignore supersession risk: Most common reason external contributions fail. Welcoming community means nothing if internal dev keep overwriting external work.
Over-weight arch, skip governance: Pretty-designed framework can still fail if governance model is not durable or hostile to externals.
Treat EVALUATE-FURTHER as AVOID: Mixed signals need dig, not reject. Set concrete re-eval date and list specific questions to answer.
Snapshot bias: All counts are point-in-time. Declining project with great current counts is worse than improving project with meh current counts. Always check trend over 6-12 months.
CLA complacency: Some CLAs shift copyright to project owner, meaning your contributions become their property. Read CLA text, not just checkbox.
Anchor on single framework: With no compare frameworks, any project looks either great or awful. Always benchmark vs at least one alternative, even informal.

Dépôt GitHub

pjt222/agent-almanac

Chemin: i18n/caveman/skills/evaluate-agent-framework

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the evaluate-agent-framework skill?

evaluate-agent-framework is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform evaluate-agent-framework-related tasks without extra prompting.

How do I install evaluate-agent-framework?

Use the install commands on this page: add evaluate-agent-framework to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does evaluate-agent-framework belong to?

evaluate-agent-framework is in the Design category, tagged ai and design.

Is evaluate-agent-framework free to use?

Yes. evaluate-agent-framework is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Compétences associées

executing-plans

Design

Utilisez la compétence executing-plans lorsque vous disposez d'un plan de mise en œuvre complet à exécuter par lots contrôlés avec des points de contrôle de revue. Elle charge et examine le plan de manière critique, puis exécute les tâches par petits lots (3 tâches par défaut) tout en rapportant la progression entre chaque lot pour une revue par l'architecte. Cela garantit une mise en œuvre systématique avec des points de contrôle de qualité intégrés.

Voir la compétence

requesting-code-review

Design

Cette compétence délègue un sous-agent réviseur de code pour analyser les modifications apportées au code par rapport aux exigences avant de poursuivre. Elle doit être utilisée après avoir terminé des tâches, implémenté des fonctionnalités majeures, ou avant une fusion vers la branche principale. La revue aide à détecter précocement les problèmes en comparant l'implémentation actuelle avec le plan initial.

Voir la compétence

connect-mcp-server

Design

Cette compétence fournit un guide complet permettant aux développeurs de connecter des serveurs MCP à Claude Code via les transports HTTP, stdio ou SSE. Elle couvre l'installation, la configuration, l'authentification et la sécurité pour intégrer des services externes tels que GitHub, Notion et des API personnalisées. Utilisez-la lors de la configuration d'intégrations MCP, de la configuration d'outils externes ou du travail avec le Protocole de Contexte de Modèle de Claude.

Voir la compétence

web-cli-teleport

Design

Cette compétence aide les développeurs à choisir entre les interfaces Web et CLI de Claude Code en fonction de l'analyse des tâches, puis permet une téléportation transparente des sessions entre ces environnements. Elle optimise le flux de travail en gérant l'état et le contexte de la session lors du passage entre le web, la CLI ou le mobile. Utilisez-la pour des projets complexes nécessitant différents outils à diverses étapes.

Voir la compétence

evaluate-agent-framework

À propos

Installation rapide

Claude Code

Documentation

Evaluate Agent Framework

When Use

Inputs

Steps

Step 1: Gather Framework Census

Step 2: Assess Community Health

Step 3: Calculate Supersession Risk

Step 4: Evaluate Architecture Alignment

Step 5: Assess Governance and Sustainability

Step 6: Classify Investment Readiness

Validation

Pitfalls

See Also

Dépôt GitHub

Frequently asked questions

What is the evaluate-agent-framework skill?

How do I install evaluate-agent-framework?

What category does evaluate-agent-framework belong to?

Is evaluate-agent-framework free to use?

Compétences associées