MCP HubMCP Hub
스킬 목록으로 돌아가기

evaluate-agent-framework

pjt222
업데이트됨 Yesterday
1 조회
17
2
17
GitHub에서 보기
디자인aidesign

정보

이 스킬은 오픈소스 AI 에이전트 프레임워크가 팀의 투자 가치가 있는지 평가합니다. 커뮤니티 활성도, 아키텍처, 거버넌스 리스크를 분석하여 명확한 INVEST(투자)/EVALUATE-FURTHER(추가 평가)/CONTRIBUTE-CAUTIOUSLY(신중한 기여)/AVOID(회피) 권고안을 제공합니다. 새로운 프레임워크에 엔지니어링 리소스를 할당하기 전에 사용하여, 데이터 기반의 도입 결정을 내릴 수 있습니다.

빠른 설치

Claude Code

추천
기본
npx skills add pjt222/agent-almanac -a claude-code
플러그인 명령대체
/plugin add https://github.com/pjt222/agent-almanac
Git 클론대체
git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/evaluate-agent-framework

Claude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요

문서

Evaluate Agent Framework

Structured check of open-source agent framework invest-readiness. New value sits in Steps 2-3: count community health by contribution survival rate; measure supersession risk — biggest reason external engineering effort wastes. Final tier (INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID) sets resource spend before commit dev cycles.

When Use

  • Picking whether to adopt agent framework for prod
  • Measuring dep risk on framework project leans on
  • Deciding whether to give engineering effort to external project
  • Compare competing frameworks for build-vs-adopt pick
  • Re-check framework after big release, governance shift, or buyout

Inputs

  • Required: framework_url — GitHub URL of framework repo
  • Optional:
    • comparison_frameworks — list of other framework URLs to benchmark
    • use_case — planned use case for arch alignment check (e.g., "multi-agent orchestration", "tool-use pipelines")
    • contribution_budget — planned engineering hours, for tier calibration

Steps

Step 1: Gather Framework Census

Grab base data on project size, activity, landscape place before deep dig.

  1. Fetch and read README.md, CONTRIBUTING.md, LICENSE, and any arch docs (docs/, ARCHITECTURE.md)
  2. Grab counts:
    • Stars, forks, open issues, open PRs: gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests
    • Dependent repos: check GitHub's "Used by" count or gh api repos/<owner>/<repo>/dependents
    • Release cadence: gh release list --limit 10 — note how often and if releases follow semver
  3. Count bus factor: find top 5 contributors by commit count over last 12 months. Top contributor do >60% of commits? Bus factor critically low
  4. Map landscape place:
    • Pioneer: first mover, defines category (high sway, high supersession risk to followers)
    • Fast-follower: launched within 6 months of pioneer, iterating on concept
    • Late entrant: arrived after category stable, competing on features or governance
  5. If comparison_frameworks given, grab same counts for each

Got: Census table with stars, forks, dependents, release cadence, bus factor, landscape place for target (and compares if given).

If fail: Repo private or API-rate-limited? Fall back to manual README read. Counts not there (e.g., self-hosted GitLab)? Note gap and go with qualitative check.

Step 2: Assess Community Health

Count whether project welcomes, supports, keeps external contributors.

  1. Count external contribution survival rate:
    • Pull last 50 closed PRs: gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels
    • Sort each PR author as internal (org member) or external
    • Compute: survival_rate = merged_external_PRs / total_external_PRs
    • Healthy threshold: >50% survival rate; concerning: <30%
  2. Measure response:
    • Issue first-response time: median from issue open to first maintainer comment
    • PR merge lag: median from PR open to merge for external PRs
    • Healthy: <7 days first-response, <30 days merge; concerning: >30 days first-response
  3. Check contributor spread:
    • External/internal contributor ratio over last 6 months
    • Count unique external contributors with >=2 merged PRs (repeat contributors signal healthy ecosystem)
  4. Check governance artifacts:
    • CONTRIBUTING.md exists and is actionable (not just "submit a PR")
    • CODE_OF_CONDUCT.md exists
    • Governance docs describe decision process
    • Issue/PR templates guide contributors

Got: Community health scorecard with survival rate, response times, spread ratio, governance artifact checklist.

If fail: PR data thin (new project with <20 closed PRs)? Note sample-size limit and weight other signals more. Project uses non-GitHub platform? Adapt queries to that platform API.

Step 3: Calculate Supersession Risk

Figure how likely external contributions get wiped by internal dev — single biggest risk for framework adopters and contributors.

  1. Sample last 50-100 merged external PRs (or all if fewer)
  2. For each merged external PR, check if contributed code was later:
    • Reverted: explicit revert commit ref-ing PR
    • Rewritten: same file/module big change within 90 days by internal contributor
    • Obsoleted: feature removed or replaced in later release
  3. Count: supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external
  4. Map published roadmap (if out) against areas where external contributors active:
    • High overlap = high supersession risk (internals will build over external work)
    • Low overlap = lower supersession risk (externals fill gaps internals won't)
  5. Check for "contribution traps": areas look contribution-friendly but scheduled for internal rewrite
  6. Benchmark: NemoClaw study showed 71% external PRs superseded within 6 months — use as calibration point

Got: Supersession rate as percent, with breakdown by type (reverted/rewritten/obsoleted). Roadmap overlap check.

If fail: Commit history shallow or squash-merged (losing author info)? Estimate supersession by compare external PR file paths vs files changed in later releases. Note lower confidence.

Step 4: Evaluate Architecture Alignment

Check whether framework arch supports your use case with no heavy lock-in.

  1. Map extension points:
    • Plugin/extension API: does framework expose documented plugin interface?
    • Config surface: can behavior be tuned without fork?
    • Hook/callback system: can intercept and change framework behavior at key points?
  2. Check lock-in risk:
    • Rewrite cost: estimate engineering effort to move away (days/weeks/months)
    • Data portability: can data/state export in standard formats?
    • Standard compliance: does framework use open standards (agentskills.io, MCP, A2A) or custom protocols?
  3. Check API stability:
    • Count breaking changes per major release (CHANGELOG, migration guides)
    • Check deprecation policy (heads-up before removal)
    • Check semver (breaking changes only in major versions)
  4. Check fit with your specific use case:
    • If use_case given, check whether framework arch naturally supports it
    • Spot any arch mismatch that would need workarounds
  5. Check interop:
    • agentskills.io compat (skill model fit)
    • MCP support (tool integration)
    • A2A protocol support (agent-to-agent talk)

Got: Architecture fit report with extension point list, lock-in risk rate (low/medium/high), API stability score, use-case fit check.

If fail: Arch docs thin? Derive check from code shape and public API surface. Framework too young for stability history? Note this and weight governance signals more.

Step 5: Assess Governance and Sustainability

Check whether project governance model supports long-term life and fair treat of external contributors.

  1. Sort governance model:
    • BDFL (Benevolent Dictator for Life): one decider — fast calls, bus factor risk
    • Committee/Core team: spread decision — slower but tougher
    • Foundation-backed: formal governance (Apache, Linux Foundation, CNCF) — most durable
    • Corporate-controlled: one company drives dev — watch for rug-pull risk
  2. Check funding and sustainability:
    • Funding sources: VC-backed, corporate-sponsored, grants, community-funded, unfunded
    • Full-time maintainer count: >=2 is healthy; 0 is red flag
    • Revenue model (if any): how does project keep going?
  3. Check contributor protections:
    • License type: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
    • CLA rules: does signing CLA shift rights in way that hurt contributors?
    • Contributor credit: external contributors credited in releases, changelogs, docs?
  4. Check security stance:
    • Security disclosure policy (SECURITY.md or same)
    • Median time from CVE disclose to patch release
    • Dep update patterns (Dependabot, Renovate, manual)
  5. Check trajectory:
    • Governance model shifting (e.g., moving toward foundation)?
    • Recent leadership change, buyout, or relicense?
    • Public conflicts between maintainers and contributors?

Got: Governance check with model class, durability rate (durable/at-risk/critical), contributor protection check, security stance summary.

If fail: Governance info not logged? Take the absence itself as yellow flag. Check for hidden governance by who merges PRs, who closes issues, who makes release picks.

Step 6: Classify Investment Readiness

Fold all finds into four-tier sort with specific reasons and actionable advice.

  1. Score each dimension (1-5 scale):
    • Community health: survival rate, response, spread
    • Supersession risk: rate, roadmap overlap, contribution traps (invert: lower is better)
    • Architecture fit: extension points, lock-in, stability, use-case fit
    • Governance durability: model, funding, protections, security
  2. Apply tier thresholds:
    • INVEST (all dimensions >=4): Healthy community, low supersession (<20%), fit arch, durable governance. Safe to adopt and give engineering effort.
    • EVALUATE-FURTHER (mixed, no dimension <2): Mixed signals need specific follow-ups. Log what needs clarify and set re-eval date.
    • CONTRIBUTE-CAUTIOUSLY (any dimension 2, none <2): High supersession (>40%) or governance worries. Limit contributions to explicit-requested work, maintainer-approved scope, or plugin/extension dev decoupled from core.
    • AVOID (any dimension 1): Critical red flags — abandoned project, hostile to externals (survival rate <15%), incompatible license, or soon rug-pull signs. Do not give engineering effort.
  3. Write tier report:
    • Lead with tier and one-line reason
    • Sum each dimension score with key evidence
    • If contribution_budget given, advise how to split those hours given tier
    • For EVALUATE-FURTHER, list specific questions that need answers and set timeline
    • For CONTRIBUTE-CAUTIOUSLY, say which contribution types safe (plugins, docs, tests) vs risky (core features)
  4. If comparison_frameworks checked, make compare matrix ranking all frameworks

Got: Tier report with label, dimension scores, evidence sum, actionable advice tuned to invest context.

If fail: Data gaps block confident sort? Default to EVALUATE-FURTHER with clear log of what data missing and how to get it. Never default to INVEST when unsure.

Validation

  • Census data grabbed: stars, forks, dependents, release cadence, bus factor, landscape place
  • Community health counted: survival rate, response times, contributor spread, governance artifacts
  • Supersession risk counted with breakdown by type (reverted/rewritten/obsoleted)
  • Architecture fit checked: extension points, lock-in risk, API stability, use-case fit
  • Governance checked: model, funding, contributor protections, security stance
  • Tier made: one of INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
  • Each dimension score backed with specific evidence from analysis
  • Advice actionable and tuned to contribution budget (if given)
  • Data gaps and confidence limits clearly logged

Pitfalls

  • Mix popularity with health: High stars but low contributor spread mean single fail point. 50k-star project with one maintainer is less healthy than 2k-star project with 15 active contributors.
  • Ignore supersession risk: Most common reason external contributions fail. Welcoming community means nothing if internal dev keep overwriting external work.
  • Over-weight arch, skip governance: Pretty-designed framework can still fail if governance model is not durable or hostile to externals.
  • Treat EVALUATE-FURTHER as AVOID: Mixed signals need dig, not reject. Set concrete re-eval date and list specific questions to answer.
  • Snapshot bias: All counts are point-in-time. Declining project with great current counts is worse than improving project with meh current counts. Always check trend over 6-12 months.
  • CLA complacency: Some CLAs shift copyright to project owner, meaning your contributions become their property. Read CLA text, not just checkbox.
  • Anchor on single framework: With no compare frameworks, any project looks either great or awful. Always benchmark vs at least one alternative, even informal.

See Also

GitHub 저장소

pjt222/agent-almanac
경로: i18n/caveman/skills/evaluate-agent-framework
0
agentsagentskillsai-assisted-developmentclaude-codeskillsteams

연관 스킬

executing-plans

디자인

executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.

스킬 보기

requesting-code-review

디자인

이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.

스킬 보기

connect-mcp-server

디자인

이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.

스킬 보기

web-cli-teleport

디자인

이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.

스킬 보기