返回技能列表

evaluate-agent-framework

pjt222
更新于 Yesterday
5 次查看
17
2
17
在 GitHub 上查看
设计aidesign

关于

This skill evaluates open-source AI agent frameworks to determine if they are worth your team's investment. It analyzes community health, architecture, and governance risks, outputting a clear INVEST/EVALUATE-FURTHER/CONTRIBUTE-CAUTIOUSLY/AVOID recommendation. Use it before committing engineering resources to a new framework to make data-driven adoption decisions.

快速安装

Claude Code

推荐
主要方式
npx skills add pjt222/agent-almanac -a claude-code
插件命令备选方式
/plugin add https://github.com/pjt222/agent-almanac
Git 克隆备选方式
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-framework

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Evaluate Agent Framework

Structured check of open-source agent framework invest-readiness. New value sits in Steps 2-3: count community health by contribution survival rate; measure supersession risk — biggest reason external engineering effort wastes. Final tier (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) sets resource spend before commit dev cycles.

When Use

  • Picking whether to adopt agent framework for prod
  • Measuring dep risk on framework project leans on
  • Deciding whether to give engineering effort to external project
  • Compare competing frameworks for build-vs-adopt pick
  • Re-check framework after big release, governance shift, or buyout

Inputs

  • Required: framework_url — GitHub URL of framework repo
  • Optional:
    • comparison_frameworks — list of other framework URLs to benchmark
    • use_case — planned use case for arch alignment check (e.g., "multi-agent orchestration", "tool-use pipelines")
    • contribution_budget — planned engineering hours, for tier calibration

Steps

Step 1: Gather Framework Census

Grab base data on project size, activity, landscape place before deep dig.

  1. Fetch and read README.md, CONTRIBUTING.md, LICENSE, and any arch docs (docs/, ARCHITECTURE.md)
  2. Grab counts:
    • Stars, forks, open issues, open PRs: gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests
    • Dependent repos: check GitHub's "Used by" count or gh api repos/<owner>/<repo>/dependents
    • Release cadence: gh release list --limit 10 — note how often and if releases follow semver
  3. Count bus factor: find top 5 contributors by commit count over last 12 months. Top contributor do >60% of commits? Bus factor critically low
  4. Map landscape place:
    • Pioneer: first mover, defines category (high sway, high supersession risk to followers)
    • Fast-follower: launched within 6 months of pioneer, iterating on concept
    • Late entrant: arrived after category stable, competing on features or governance
  5. If comparison_frameworks given, grab same counts for each

Got: Census table with stars, forks, dependents, release cadence, bus factor, landscape place for target (and compares if given).

If fail: Repo private or API-rate-limited? Fall back to manual README read. Counts not there (e.g., self-hosted GitLab)? Note gap and go with qualitative check.

Step 2: Assess Community Health

Count whether project welcomes, supports, keeps external contributors.

  1. Count external contribution survival rate:
    • Pull last 50 closed PRs: gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels
    • Sort each PR author as internal (org member) or external
    • Compute: survival_rate = merged_external_PRs / total_external_PRs
    • Healthy threshold: >50% survival rate; concerning: <30%
  2. Measure response:
    • Issue first-response time: median from issue open to first maintainer comment
    • PR merge lag: median from PR open to merge for external PRs
    • Healthy: <7 days first-response, <30 days merge; concerning: >30 days first-response
  3. Check contributor spread:
    • External/internal contributor ratio over last 6 months
    • Count unique external contributors with >=2 merged PRs (repeat contributors signal healthy ecosystem)
  4. Check governance artifacts:
    • CONTRIBUTING.md exists and is actionable (not just "submit a PR")
    • CODE_OF_CONDUCT.md exists
    • Governance docs describe decision process
    • Issue/PR templates guide contributors

Got: Community health scorecard with survival rate, response times, spread ratio, governance artifact checklist.

If fail: PR data thin (new project with <20 closed PRs)? Note sample-size limit and weight other signals more. Project uses non-GitHub platform? Adapt queries to that platform API.

Step 3: Calculate Supersession Risk

Figure how likely external contributions get wiped by internal dev — single biggest risk for framework adopters and contributors.

  1. Sample last 50-100 merged external PRs (or all if fewer)
  2. For each merged external PR, check if contributed code was later:
    • Reverted: explicit revert commit ref-ing PR
    • Rewritten: same file/module big change within 90 days by internal contributor
    • Obsoleted: feature removed or replaced in later release
  3. Count: supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external
  4. Map published roadmap (if out) against areas where external contributors active:
    • High overlap = high supersession risk (internals will build over external work)
    • Low overlap = lower supersession risk (externals fill gaps internals won't)
  5. Check for "contribution traps": areas look contribution-friendly but scheduled for internal rewrite
  6. Benchmark: NemoClaw study showed 71% external PRs superseded within 6 months — use as calibration point

Got: Supersession rate as percent, with breakdown by type (reverted/rewritten/obsoleted). Roadmap overlap check.

If fail: Commit history shallow or squash-merged (losing author info)? Estimate supersession by compare external PR file paths vs files changed in later releases. Note lower confidence.

Step 4: Evaluate Architecture Alignment

Check whether framework arch supports your use case with no heavy lock-in.

  1. Map extension points:
    • Plugin/extension API: does framework expose documented plugin interface?
    • Config surface: can behavior be tuned without fork?
    • Hook/callback system: can intercept and change framework behavior at key points?
  2. Check lock-in risk:
    • Rewrite cost: estimate engineering effort to move away (days/weeks/months)
    • Data portability: can data/state export in standard formats?
    • Standard compliance: does framework use open standards (agentskills.io, MCP, A2A) or custom protocols?
  3. Check API stability:
    • Count breaking changes per major release (CHANGELOG, migration guides)
    • Check deprecation policy (heads-up before removal)
    • Check semver (breaking changes only in major versions)
  4. Check fit with your specific use case:
    • If use_case given, check whether framework arch naturally supports it
    • Spot any arch mismatch that would need workarounds
  5. Check interop:
    • agentskills.io compat (skill model fit)
    • MCP support (tool integration)
    • A2A protocol support (agent-to-agent talk)

Got: Architecture fit report with extension point list, lock-in risk rate (low/medium/high), API stability score, use-case fit check.

If fail: Arch docs thin? Derive check from code shape and public API surface. Framework too young for stability history? Note this and weight governance signals more.

Step 5: Assess Governance and Sustainability

Check whether project governance model supports long-term life and fair treat of external contributors.

  1. Sort governance model:
    • BDFL (Benevolent Dictator for Life): one decider — fast calls, bus factor risk
    • Committee/Core team: spread decision — slower but tougher
    • Foundation-backed: formal governance (Apache, Linux Foundation, CNCF) — most durable
    • Corporate-controlled: one company drives dev — watch for rug-pull risk
  2. Check funding and sustainability:
    • Funding sources: VC-backed, corporate-sponsored, grants, community-funded, unfunded
    • Full-time maintainer count: >=2 is healthy; 0 is red flag
    • Revenue model (if any): how does project keep going?
  3. Check contributor protections:
    • License type: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
    • CLA rules: does signing CLA shift rights in way that hurt contributors?
    • Contributor credit: external contributors credited in releases, changelogs, docs?
  4. Check security stance:
    • Security disclosure policy (SECURITY.md or same)
    • Median time from CVE disclose to patch release
    • Dep update patterns (Dependabot, Renovate, manual)
  5. Check trajectory:
    • Governance model shifting (e.g., moving toward foundation)?
    • Recent leadership change, buyout, or relicense?
    • Public conflicts between maintainers and contributors?

Got: Governance check with model class, durability rate (durable/at-risk/critical), contributor protection check, security stance summary.

If fail: Governance info not logged? Take the absence itself as yellow flag. Check for hidden governance by who merges PRs, who closes issues, who makes release picks.

Step 6: Classify Investment Readiness

Fold all finds into four-tier sort with specific reasons and actionable advice.

  1. Score each dimension (1-5 scale):
    • Community health: survival rate, response, spread
    • Supersession risk: rate, roadmap overlap, contribution traps (invert: lower is better)
    • Architecture fit: extension points, lock-in, stability, use-case fit
    • Governance durability: model, funding, protections, security
  2. Apply tier thresholds:
    • INVEST (all dimensions >=4): Healthy community, low supersession (<20%), fit arch, durable governance. Safe to adopt and give engineering effort.
    • EVALUATE-FURTHER (mixed, no dimension <2): Mixed signals need specific follow-ups. Log what needs clarify and set re-eval date.
    • CONTRIBUTE-CAUTIOUSLY (any dimension 2, none <2): High supersession (>40%) or governance worries. Limit contributions to explicit-requested work, maintainer-approved scope, or plugin/extension dev decoupled from core.
    • AVOID (any dimension 1): Critical red flags — abandoned project, hostile to externals (survival rate <15%), incompatible license, or soon rug-pull signs. Do not give engineering effort.
  3. Write tier report:
    • Lead with tier and one-line reason
    • Sum each dimension score with key evidence
    • If contribution_budget given, advise how to split those hours given tier
    • For EVALUATE-FURTHER, list specific questions that need answers and set timeline
    • For CONTRIBUTE-CAUTIOUSLY, say which contribution types safe (plugins, docs, tests) vs risky (core features)
  4. If comparison_frameworks checked, make compare matrix ranking all frameworks

Got: Tier report with label, dimension scores, evidence sum, actionable advice tuned to invest context.

If fail: Data gaps block confident sort? Default to EVALUATE-FURTHER with clear log of what data missing and how to get it. Never default to INVEST when unsure.

Validation

  • Census data grabbed: stars, forks, dependents, release cadence, bus factor, landscape place
  • Community health counted: survival rate, response times, contributor spread, governance artifacts
  • Supersession risk counted with breakdown by type (reverted/rewritten/obsoleted)
  • Architecture fit checked: extension points, lock-in risk, API stability, use-case fit
  • Governance checked: model, funding, contributor protections, security stance
  • Tier made: one of INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
  • Each dimension score backed with specific evidence from analysis
  • Advice actionable and tuned to contribution budget (if given)
  • Data gaps and confidence limits clearly logged

Pitfalls

  • Mix popularity with health: High stars but low contributor spread mean single fail point. 50k-star project with one maintainer is less healthy than 2k-star project with 15 active contributors.
  • Ignore supersession risk: Most common reason external contributions fail. Welcoming community means nothing if internal dev keep overwriting external work.
  • Over-weight arch, skip governance: Pretty-designed framework can still fail if governance model is not durable or hostile to externals.
  • Treat EVALUATE-FURTHER as AVOID: Mixed signals need dig, not reject. Set concrete re-eval date and list specific questions to answer.
  • Snapshot bias: All counts are point-in-time. Declining project with great current counts is worse than improving project with meh current counts. Always check trend over 6-12 months.
  • CLA complacency: Some CLAs shift copyright to project owner, meaning your contributions become their property. Read CLA text, not just checkbox.
  • Anchor on single framework: With no compare frameworks, any project looks either great or awful. Always benchmark vs at least one alternative, even informal.

See Also

GitHub 仓库

pjt222/agent-almanac
路径: i18n/caveman/skills/evaluate-agent-framework
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

相关推荐技能

executing-plans

设计

该Skill用于当开发者提供完整实施计划时,以受控批次方式执行代码实现。它会先审阅计划并提出疑问,然后分批次执行任务(默认每批3个任务),并在批次间暂停等待审查。关键特性包括分批次执行、内置检查点和架构师审查机制,确保复杂系统实现的可控性。

查看技能

requesting-code-review

设计

该Skill可在完成任务、实现主要功能或合并代码前自动调度代码审查子代理,确保实现符合需求和计划。它支持通过指定git SHA范围进行精准的代码变更审查,帮助开发者在关键节点及时发现潜在问题。核心原则是"早审查、勤审查",适用于开发流程的各个关键阶段。

查看技能

connect-mcp-server

设计

这个Skill指导开发者如何将MCP服务器连接到Claude Code,支持HTTP、stdio和SSE三种传输协议。它涵盖了从安装配置到认证安全的完整流程,适用于集成GitHub、Notion、数据库等外部服务。当开发者需要添加集成、配置外部工具或提及MCP相关功能时,这个Skill能提供实用的操作指南。

查看技能

web-cli-teleport

设计

该Skill帮助开发者根据任务特性选择Claude Code的Web或CLI界面,并指导如何在两种环境间无缝迁移会话。它能分析任务复杂度、迭代需求等要素,推荐最优工作界面和工作流。关键特性包括会话状态管理、环境切换指导和上下文优化建议。

查看技能