SKILL·B0BFEC

evaluate-agent-framework

Name: evaluate-agent-framework
Author: pjt222

pjt222

업데이트됨 1 month ago

9 조회

디자인aidesign

정보

이 스킬은 오픈소스 AI 에이전트 프레임워크가 팀의 투자 가치가 있는지 평가합니다. 커뮤니티 활성도, 아키텍처, 거버넌스 리스크를 분석하여 명확한 INVEST(투자)/EVALUATE-FURTHER(추가 평가)/CONTRIBUTE-CAUTIOUSLY(신중한 기여)/AVOID(회피) 권고안을 제공합니다. 새로운 프레임워크에 엔지니어링 리소스를 할당하기 전에 사용하여, 데이터 기반의 도입 결정을 내릴 수 있습니다.

빠른 설치

Claude Code

문서

Evaluate Agent Framework

Structured check of open-source agent framework invest-readiness. New value sits in Steps 2-3: count community health by contribution survival rate; measure supersession risk — biggest reason external engineering effort wastes. Final tier (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) sets resource spend before commit dev cycles.

When Use

Picking whether to adopt agent framework for prod
Measuring dep risk on framework project leans on
Deciding whether to give engineering effort to external project
Compare competing frameworks for build-vs-adopt pick
Re-check framework after big release, governance shift, or buyout

Inputs

Required: framework_url — GitHub URL of framework repo
Optional:
- comparison_frameworks — list of other framework URLs to benchmark
- use_case — planned use case for arch alignment check (e.g., "multi-agent orchestration", "tool-use pipelines")
- contribution_budget — planned engineering hours, for tier calibration

Steps

Step 1: Gather Framework Census

Grab base data on project size, activity, landscape place before deep dig.

Fetch and read README.md, CONTRIBUTING.md, LICENSE, and any arch docs (docs/, ARCHITECTURE.md)
Grab counts:
- Stars, forks, open issues, open PRs: gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests
- Dependent repos: check GitHub's "Used by" count or gh api repos/<owner>/<repo>/dependents
- Release cadence: gh release list --limit 10 — note how often and if releases follow semver
Count bus factor: find top 5 contributors by commit count over last 12 months. Top contributor do >60% of commits? Bus factor critically low
Map landscape place:
- Pioneer: first mover, defines category (high sway, high supersession risk to followers)
- Fast-follower: launched within 6 months of pioneer, iterating on concept
- Late entrant: arrived after category stable, competing on features or governance
If comparison_frameworks given, grab same counts for each

Got: Census table with stars, forks, dependents, release cadence, bus factor, landscape place for target (and compares if given).

If fail: Repo private or API-rate-limited? Fall back to manual README read. Counts not there (e.g., self-hosted GitLab)? Note gap and go with qualitative check.

Step 2: Assess Community Health

Count whether project welcomes, supports, keeps external contributors.

Count external contribution survival rate:
- Pull last 50 closed PRs: gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels
- Sort each PR author as internal (org member) or external
- Compute: survival_rate = merged_external_PRs / total_external_PRs
- Healthy threshold: >50% survival rate; concerning: <30%
Measure response:
- Issue first-response time: median from issue open to first maintainer comment
- PR merge lag: median from PR open to merge for external PRs
- Healthy: <7 days first-response, <30 days merge; concerning: >30 days first-response
Check contributor spread:
- External/internal contributor ratio over last 6 months
- Count unique external contributors with >=2 merged PRs (repeat contributors signal healthy ecosystem)
Check governance artifacts:
- CONTRIBUTING.md exists and is actionable (not just "submit a PR")
- CODE_OF_CONDUCT.md exists
- Governance docs describe decision process
- Issue/PR templates guide contributors

Got: Community health scorecard with survival rate, response times, spread ratio, governance artifact checklist.

If fail: PR data thin (new project with <20 closed PRs)? Note sample-size limit and weight other signals more. Project uses non-GitHub platform? Adapt queries to that platform API.

Step 3: Calculate Supersession Risk

Figure how likely external contributions get wiped by internal dev — single biggest risk for framework adopters and contributors.

Sample last 50-100 merged external PRs (or all if fewer)
For each merged external PR, check if contributed code was later:
- Reverted: explicit revert commit ref-ing PR
- Rewritten: same file/module big change within 90 days by internal contributor
- Obsoleted: feature removed or replaced in later release
Count: supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external
Map published roadmap (if out) against areas where external contributors active:
- High overlap = high supersession risk (internals will build over external work)
- Low overlap = lower supersession risk (externals fill gaps internals won't)
Check for "contribution traps": areas look contribution-friendly but scheduled for internal rewrite
Benchmark: NemoClaw study showed 71% external PRs superseded within 6 months — use as calibration point

Got: Supersession rate as percent, with breakdown by type (reverted/rewritten/obsoleted). Roadmap overlap check.

If fail: Commit history shallow or squash-merged (losing author info)? Estimate supersession by compare external PR file paths vs files changed in later releases. Note lower confidence.

Step 4: Evaluate Architecture Alignment

Check whether framework arch supports your use case with no heavy lock-in.

Map extension points:
- Plugin/extension API: does framework expose documented plugin interface?
- Config surface: can behavior be tuned without fork?
- Hook/callback system: can intercept and change framework behavior at key points?
Check lock-in risk:
- Rewrite cost: estimate engineering effort to move away (days/weeks/months)
- Data portability: can data/state export in standard formats?
- Standard compliance: does framework use open standards (agentskills.io, MCP, A2A) or custom protocols?
Check API stability:
- Count breaking changes per major release (CHANGELOG, migration guides)
- Check deprecation policy (heads-up before removal)
- Check semver (breaking changes only in major versions)
Check fit with your specific use case:
- If use_case given, check whether framework arch naturally supports it
- Spot any arch mismatch that would need workarounds
Check interop:
- agentskills.io compat (skill model fit)
- MCP support (tool integration)
- A2A protocol support (agent-to-agent talk)

Got: Architecture fit report with extension point list, lock-in risk rate (low/medium/high), API stability score, use-case fit check.

If fail: Arch docs thin? Derive check from code shape and public API surface. Framework too young for stability history? Note this and weight governance signals more.

Step 5: Assess Governance and Sustainability

Check whether project governance model supports long-term life and fair treat of external contributors.

Sort governance model:
- BDFL (Benevolent Dictator for Life): one decider — fast calls, bus factor risk
- Committee/Core team: spread decision — slower but tougher
- Foundation-backed: formal governance (Apache, Linux Foundation, CNCF) — most durable
- Corporate-controlled: one company drives dev — watch for rug-pull risk
Check funding and sustainability:
- Funding sources: VC-backed, corporate-sponsored, grants, community-funded, unfunded
- Full-time maintainer count: >=2 is healthy; 0 is red flag
- Revenue model (if any): how does project keep going?
Check contributor protections:
- License type: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA rules: does signing CLA shift rights in way that hurt contributors?
- Contributor credit: external contributors credited in releases, changelogs, docs?
Check security stance:
- Security disclosure policy (SECURITY.md or same)
- Median time from CVE disclose to patch release
- Dep update patterns (Dependabot, Renovate, manual)
Check trajectory:
- Governance model shifting (e.g., moving toward foundation)?
- Recent leadership change, buyout, or relicense?
- Public conflicts between maintainers and contributors?

Got: Governance check with model class, durability rate (durable/at-risk/critical), contributor protection check, security stance summary.

If fail: Governance info not logged? Take the absence itself as yellow flag. Check for hidden governance by who merges PRs, who closes issues, who makes release picks.

Step 6: Classify Investment Readiness

Fold all finds into four-tier sort with specific reasons and actionable advice.

Score each dimension (1-5 scale):
- Community health: survival rate, response, spread
- Supersession risk: rate, roadmap overlap, contribution traps (invert: lower is better)
- Architecture fit: extension points, lock-in, stability, use-case fit
- Governance durability: model, funding, protections, security
Apply tier thresholds:
- INVEST (all dimensions >=4): Healthy community, low supersession (<20%), fit arch, durable governance. Safe to adopt and give engineering effort.
- EVALUATE-FURTHER (mixed, no dimension <2): Mixed signals need specific follow-ups. Log what needs clarify and set re-eval date.
- CONTRIBUTE-CAUTIOUSLY (any dimension 2, none <2): High supersession (>40%) or governance worries. Limit contributions to explicit-requested work, maintainer-approved scope, or plugin/extension dev decoupled from core.
- AVOID (any dimension 1): Critical red flags — abandoned project, hostile to externals (survival rate <15%), incompatible license, or soon rug-pull signs. Do not give engineering effort.
Write tier report:
- Lead with tier and one-line reason
- Sum each dimension score with key evidence
- If contribution_budget given, advise how to split those hours given tier
- For EVALUATE-FURTHER, list specific questions that need answers and set timeline
- For CONTRIBUTE-CAUTIOUSLY, say which contribution types safe (plugins, docs, tests) vs risky (core features)
If comparison_frameworks checked, make compare matrix ranking all frameworks

Got: Tier report with label, dimension scores, evidence sum, actionable advice tuned to invest context.

If fail: Data gaps block confident sort? Default to EVALUATE-FURTHER with clear log of what data missing and how to get it. Never default to INVEST when unsure.

Validation

Census data grabbed: stars, forks, dependents, release cadence, bus factor, landscape place
Community health counted: survival rate, response times, contributor spread, governance artifacts
Supersession risk counted with breakdown by type (reverted/rewritten/obsoleted)
Architecture fit checked: extension points, lock-in risk, API stability, use-case fit
Governance checked: model, funding, contributor protections, security stance
Tier made: one of INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
Each dimension score backed with specific evidence from analysis
Advice actionable and tuned to contribution budget (if given)
Data gaps and confidence limits clearly logged

Pitfalls

Mix popularity with health: High stars but low contributor spread mean single fail point. 50k-star project with one maintainer is less healthy than 2k-star project with 15 active contributors.
Ignore supersession risk: Most common reason external contributions fail. Welcoming community means nothing if internal dev keep overwriting external work.
Over-weight arch, skip governance: Pretty-designed framework can still fail if governance model is not durable or hostile to externals.
Treat EVALUATE-FURTHER as AVOID: Mixed signals need dig, not reject. Set concrete re-eval date and list specific questions to answer.
Snapshot bias: All counts are point-in-time. Declining project with great current counts is worse than improving project with meh current counts. Always check trend over 6-12 months.
CLA complacency: Some CLAs shift copyright to project owner, meaning your contributions become their property. Read CLA text, not just checkbox.
Anchor on single framework: With no compare frameworks, any project looks either great or awful. Always benchmark vs at least one alternative, even informal.

GitHub 저장소

pjt222/agent-almanac

경로: i18n/caveman/skills/evaluate-agent-framework

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the evaluate-agent-framework skill?

evaluate-agent-framework is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform evaluate-agent-framework-related tasks without extra prompting.

How do I install evaluate-agent-framework?

Use the install commands on this page: add evaluate-agent-framework to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does evaluate-agent-framework belong to?

evaluate-agent-framework is in the Design category, tagged ai and design.

Is evaluate-agent-framework free to use?

Yes. evaluate-agent-framework is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

executing-plans

디자인

executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.

스킬 보기

requesting-code-review

디자인

이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.

스킬 보기

connect-mcp-server

디자인

이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.

스킬 보기

web-cli-teleport

디자인

이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.

스킬 보기

evaluate-agent-framework

정보

빠른 설치

Claude Code

문서

Evaluate Agent Framework

When Use

Inputs

Steps

Step 1: Gather Framework Census

Step 2: Assess Community Health

Step 3: Calculate Supersession Risk

Step 4: Evaluate Architecture Alignment

Step 5: Assess Governance and Sustainability

Step 6: Classify Investment Readiness

Validation

Pitfalls

See Also

GitHub 저장소

Frequently asked questions

What is the evaluate-agent-framework skill?

How do I install evaluate-agent-framework?

What category does evaluate-agent-framework belong to?

Is evaluate-agent-framework free to use?

연관 스킬