evaluate-agent-framework
정보
이 스킬은 오픈소스 AI 에이전트 프레임워크가 팀의 투자 가치가 있는지 평가합니다. 커뮤니티 활성도, 아키텍처, 거버넌스 리스크를 분석하여 명확한 INVEST(투자)/EVALUATE-FURTHER(추가 평가)/CONTRIBUTE-CAUTIOUSLY(신중한 기여)/AVOID(회피) 권고안을 제공합니다. 새로운 프레임워크에 엔지니어링 리소스를 할당하기 전에 사용하여, 데이터 기반의 도입 결정을 내릴 수 있습니다.
빠른 설치
Claude Code
추천npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-frameworkClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
Evaluate Agent Framework
Structured check of open-source agent framework invest-readiness. New value sits in Steps 2-3: count community health by contribution survival rate; measure supersession risk — biggest reason external engineering effort wastes. Final tier (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) sets resource spend before commit dev cycles.
When Use
- Picking whether to adopt agent framework for prod
- Measuring dep risk on framework project leans on
- Deciding whether to give engineering effort to external project
- Compare competing frameworks for build-vs-adopt pick
- Re-check framework after big release, governance shift, or buyout
Inputs
- Required:
framework_url— GitHub URL of framework repo - Optional:
comparison_frameworks— list of other framework URLs to benchmarkuse_case— planned use case for arch alignment check (e.g., "multi-agent orchestration", "tool-use pipelines")contribution_budget— planned engineering hours, for tier calibration
Steps
Step 1: Gather Framework Census
Grab base data on project size, activity, landscape place before deep dig.
- Fetch and read
README.md,CONTRIBUTING.md,LICENSE, and any arch docs (docs/,ARCHITECTURE.md) - Grab counts:
- Stars, forks, open issues, open PRs:
gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests - Dependent repos: check GitHub's "Used by" count or
gh api repos/<owner>/<repo>/dependents - Release cadence:
gh release list --limit 10— note how often and if releases follow semver
- Stars, forks, open issues, open PRs:
- Count bus factor: find top 5 contributors by commit count over last 12 months. Top contributor do >60% of commits? Bus factor critically low
- Map landscape place:
- Pioneer: first mover, defines category (high sway, high supersession risk to followers)
- Fast-follower: launched within 6 months of pioneer, iterating on concept
- Late entrant: arrived after category stable, competing on features or governance
- If
comparison_frameworksgiven, grab same counts for each
Got: Census table with stars, forks, dependents, release cadence, bus factor, landscape place for target (and compares if given).
If fail: Repo private or API-rate-limited? Fall back to manual README read. Counts not there (e.g., self-hosted GitLab)? Note gap and go with qualitative check.
Step 2: Assess Community Health
Count whether project welcomes, supports, keeps external contributors.
- Count external contribution survival rate:
- Pull last 50 closed PRs:
gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels - Sort each PR author as internal (org member) or external
- Compute:
survival_rate = merged_external_PRs / total_external_PRs - Healthy threshold: >50% survival rate; concerning: <30%
- Pull last 50 closed PRs:
- Measure response:
- Issue first-response time: median from issue open to first maintainer comment
- PR merge lag: median from PR open to merge for external PRs
- Healthy: <7 days first-response, <30 days merge; concerning: >30 days first-response
- Check contributor spread:
- External/internal contributor ratio over last 6 months
- Count unique external contributors with >=2 merged PRs (repeat contributors signal healthy ecosystem)
- Check governance artifacts:
CONTRIBUTING.mdexists and is actionable (not just "submit a PR")CODE_OF_CONDUCT.mdexists- Governance docs describe decision process
- Issue/PR templates guide contributors
Got: Community health scorecard with survival rate, response times, spread ratio, governance artifact checklist.
If fail: PR data thin (new project with <20 closed PRs)? Note sample-size limit and weight other signals more. Project uses non-GitHub platform? Adapt queries to that platform API.
Step 3: Calculate Supersession Risk
Figure how likely external contributions get wiped by internal dev — single biggest risk for framework adopters and contributors.
- Sample last 50-100 merged external PRs (or all if fewer)
- For each merged external PR, check if contributed code was later:
- Reverted: explicit revert commit ref-ing PR
- Rewritten: same file/module big change within 90 days by internal contributor
- Obsoleted: feature removed or replaced in later release
- Count:
supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external - Map published roadmap (if out) against areas where external contributors active:
- High overlap = high supersession risk (internals will build over external work)
- Low overlap = lower supersession risk (externals fill gaps internals won't)
- Check for "contribution traps": areas look contribution-friendly but scheduled for internal rewrite
- Benchmark: NemoClaw study showed 71% external PRs superseded within 6 months — use as calibration point
Got: Supersession rate as percent, with breakdown by type (reverted/rewritten/obsoleted). Roadmap overlap check.
If fail: Commit history shallow or squash-merged (losing author info)? Estimate supersession by compare external PR file paths vs files changed in later releases. Note lower confidence.
Step 4: Evaluate Architecture Alignment
Check whether framework arch supports your use case with no heavy lock-in.
- Map extension points:
- Plugin/extension API: does framework expose documented plugin interface?
- Config surface: can behavior be tuned without fork?
- Hook/callback system: can intercept and change framework behavior at key points?
- Check lock-in risk:
- Rewrite cost: estimate engineering effort to move away (days/weeks/months)
- Data portability: can data/state export in standard formats?
- Standard compliance: does framework use open standards (agentskills.io, MCP, A2A) or custom protocols?
- Check API stability:
- Count breaking changes per major release (CHANGELOG, migration guides)
- Check deprecation policy (heads-up before removal)
- Check semver (breaking changes only in major versions)
- Check fit with your specific use case:
- If
use_casegiven, check whether framework arch naturally supports it - Spot any arch mismatch that would need workarounds
- If
- Check interop:
- agentskills.io compat (skill model fit)
- MCP support (tool integration)
- A2A protocol support (agent-to-agent talk)
Got: Architecture fit report with extension point list, lock-in risk rate (low/medium/high), API stability score, use-case fit check.
If fail: Arch docs thin? Derive check from code shape and public API surface. Framework too young for stability history? Note this and weight governance signals more.
Step 5: Assess Governance and Sustainability
Check whether project governance model supports long-term life and fair treat of external contributors.
- Sort governance model:
- BDFL (Benevolent Dictator for Life): one decider — fast calls, bus factor risk
- Committee/Core team: spread decision — slower but tougher
- Foundation-backed: formal governance (Apache, Linux Foundation, CNCF) — most durable
- Corporate-controlled: one company drives dev — watch for rug-pull risk
- Check funding and sustainability:
- Funding sources: VC-backed, corporate-sponsored, grants, community-funded, unfunded
- Full-time maintainer count: >=2 is healthy; 0 is red flag
- Revenue model (if any): how does project keep going?
- Check contributor protections:
- License type: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA rules: does signing CLA shift rights in way that hurt contributors?
- Contributor credit: external contributors credited in releases, changelogs, docs?
- Check security stance:
- Security disclosure policy (
SECURITY.mdor same) - Median time from CVE disclose to patch release
- Dep update patterns (Dependabot, Renovate, manual)
- Security disclosure policy (
- Check trajectory:
- Governance model shifting (e.g., moving toward foundation)?
- Recent leadership change, buyout, or relicense?
- Public conflicts between maintainers and contributors?
Got: Governance check with model class, durability rate (durable/at-risk/critical), contributor protection check, security stance summary.
If fail: Governance info not logged? Take the absence itself as yellow flag. Check for hidden governance by who merges PRs, who closes issues, who makes release picks.
Step 6: Classify Investment Readiness
Fold all finds into four-tier sort with specific reasons and actionable advice.
- Score each dimension (1-5 scale):
- Community health: survival rate, response, spread
- Supersession risk: rate, roadmap overlap, contribution traps (invert: lower is better)
- Architecture fit: extension points, lock-in, stability, use-case fit
- Governance durability: model, funding, protections, security
- Apply tier thresholds:
- INVEST (all dimensions >=4): Healthy community, low supersession (<20%), fit arch, durable governance. Safe to adopt and give engineering effort.
- EVALUATE-FURTHER (mixed, no dimension <2): Mixed signals need specific follow-ups. Log what needs clarify and set re-eval date.
- CONTRIBUTE-CAUTIOUSLY (any dimension 2, none <2): High supersession (>40%) or governance worries. Limit contributions to explicit-requested work, maintainer-approved scope, or plugin/extension dev decoupled from core.
- AVOID (any dimension 1): Critical red flags — abandoned project, hostile to externals (survival rate <15%), incompatible license, or soon rug-pull signs. Do not give engineering effort.
- Write tier report:
- Lead with tier and one-line reason
- Sum each dimension score with key evidence
- If
contribution_budgetgiven, advise how to split those hours given tier - For EVALUATE-FURTHER, list specific questions that need answers and set timeline
- For CONTRIBUTE-CAUTIOUSLY, say which contribution types safe (plugins, docs, tests) vs risky (core features)
- If
comparison_frameworkschecked, make compare matrix ranking all frameworks
Got: Tier report with label, dimension scores, evidence sum, actionable advice tuned to invest context.
If fail: Data gaps block confident sort? Default to EVALUATE-FURTHER with clear log of what data missing and how to get it. Never default to INVEST when unsure.
Validation
- Census data grabbed: stars, forks, dependents, release cadence, bus factor, landscape place
- Community health counted: survival rate, response times, contributor spread, governance artifacts
- Supersession risk counted with breakdown by type (reverted/rewritten/obsoleted)
- Architecture fit checked: extension points, lock-in risk, API stability, use-case fit
- Governance checked: model, funding, contributor protections, security stance
- Tier made: one of INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
- Each dimension score backed with specific evidence from analysis
- Advice actionable and tuned to contribution budget (if given)
- Data gaps and confidence limits clearly logged
Pitfalls
- Mix popularity with health: High stars but low contributor spread mean single fail point. 50k-star project with one maintainer is less healthy than 2k-star project with 15 active contributors.
- Ignore supersession risk: Most common reason external contributions fail. Welcoming community means nothing if internal dev keep overwriting external work.
- Over-weight arch, skip governance: Pretty-designed framework can still fail if governance model is not durable or hostile to externals.
- Treat EVALUATE-FURTHER as AVOID: Mixed signals need dig, not reject. Set concrete re-eval date and list specific questions to answer.
- Snapshot bias: All counts are point-in-time. Declining project with great current counts is worse than improving project with meh current counts. Always check trend over 6-12 months.
- CLA complacency: Some CLAs shift copyright to project owner, meaning your contributions become their property. Read CLA text, not just checkbox.
- Anchor on single framework: With no compare frameworks, any project looks either great or awful. Always benchmark vs at least one alternative, even informal.
See Also
- polish-claw-project — contribution flow this check feeds
- review-software-architecture — used in Step 4 for arch check
- forage-solutions — other framework find for compare
- search-prior-art — landscape map and prior work check
- security-audit-codebase — security stance check from Step 5
- assess-ip-landscape — license and IP risk check
GitHub 저장소
연관 스킬
executing-plans
디자인executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.
requesting-code-review
디자인이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.
connect-mcp-server
디자인이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.
web-cli-teleport
디자인이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.
