SKILL·4DA23D

evaluate-agent-framework

Name: evaluate-agent-framework
Author: pjt222

pjt222

업데이트됨 1 month ago

8 조회

디자인aidesign

정보

이 스킬은 커뮤니티 건강도, 대체 위험성, 아키텍처, 거버넌스를 분석하여 오픈소스 AI 에이전트 프레임워크의 투자 적합성을 평가합니다. 엔지니어링 자원 할당을 안내하기 위해 4단계 분류(투자, 추가 평가, 신중한 기여, 회피)를 출력합니다. 상당한 개발 노력을 투입하기 전에 프레임워크의 장기적 생존 가능성을 평가하는 데 사용하세요.

빠른 설치

Claude Code

문서

Evaluate Agent Framework

Score OSS agent framework → invest? Steps 2-3 novel: survival rate + supersession. Tier → INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID. Calibrate effort pre-commit.

Use When

Adopt framework prod? → check
Dep risk on framework → assess
Send eng effort to ext proj? → decide
Build-vs-adopt cmp → rank
Post-release / post-gov-change / post-acq re-eval

In

Req: framework_url — GitHub repo URL
Opt:
- comparison_frameworks — alt framework URLs, bench
- use_case — intended use (e.g., "multi-agent orchestration", "tool-use pipelines") → arch fit
- contribution_budget — planned eng hrs → tier calib

Do

Step 1: Census

Size, activity, landscape → before deeper probe.

Read README.md, CONTRIBUTING.md, LICENSE, arch docs (docs/, ARCHITECTURE.md)
Quant metrics:
- Stars/forks/issues/PRs → gh repo view <repo> --json stargazerCount,forkCount,issues,pullRequests
- Dependents → GitHub "Used by" or gh api repos/<owner>/<repo>/dependents
- Release cadence → gh release list --limit 10 — freq + semver?
Bus factor → top 5 contribs last 12mo by commit. Top >60% → crit low
Landscape:
- Pioneer: first mover → defines cat (high infl, high supersession risk to followers)
- Fast-follower: <6mo post-pioneer → iterate
- Late entrant: post-stabilization → cmp on feat/gov
comparison_frameworks given → same metrics each alt

→ Census tbl: stars, forks, deps, cadence, bus factor, landscape (+cmps).

If err: private/rate-limited → manual README. No metrics (self-hosted GitLab) → note gap, qual only.

Step 2: Community Health

Welcome/support/retain externals?

External survival rate:
- Last 50 closed PRs → gh pr list --state closed --limit 50 --json author,mergedAt,closedAt,labels
- Author internal (org) vs external
- survival_rate = merged_external_PRs / total_external_PRs
- Healthy >50%; concern <30%
Responsiveness:
- Issue first-response: median issue-open → first maintainer comment
- PR merge latency: median ext PR open → merge
- Healthy <7d resp, <30d merge; concern >30d resp
Contributor diversity:
- Ext/int ratio last 6mo
- Unique externals w/ >=2 merged PRs (repeat → healthy eco)
Gov artifacts:
- CONTRIBUTING.md exists + actionable (not just "submit a PR")
- CODE_OF_CONDUCT.md exists
- Gov docs → decision process
- Issue/PR templates guide contribs

→ Scorecard: survival, resp times, diversity, gov checklist.

If err: PR data thin (<20 closed) → note sample, weight others. Non-GitHub → adapt queries to platform API.

Step 3: Supersession Risk

Ext contribs → obsoleted by internal dev? Biggest risk.

Sample last 50-100 merged ext PRs (or all if fewer)
Each merged ext PR, later:
- Reverted: explicit revert ref PR
- Rewritten: same file/module changed <90d by internal
- Obsoleted: feat removed/replaced next release
supersession_rate = (reverted + rewritten + obsoleted) / total_merged_external
Roadmap vs ext-active areas:
- High overlap → high supersession (int builds over ext)
- Low overlap → lower risk (ext fill gaps int won't)
"Contrib traps": look friendly, scheduled for int rewrite
Bench: NemoClaw → 71% ext PRs superseded <6mo. Calib pt.

→ Supersession % + breakdown (reverted/rewritten/obsoleted). Roadmap overlap.

If err: shallow/squash-merged (attrib lost) → est by ext PR paths vs files changed next releases. Lower confidence.

Step 4: Architecture Alignment

Arch supports use case w/o lock-in?

Extension pts:
- Plugin API → documented?
- Config surface → customize no-fork?
- Hook/callback → intercept behavior?
Lock-in:
- Rewrite cost: migrate-away est (d/wk/mo)
- Data portability: export std fmt?
- Std compliance: agentskills.io, MCP, A2A vs proprietary?
API stability:
- Breaking changes/major (CHANGELOG, migration guides)
- Deprecation policy (advance warn)
- Semver compliance (breaking → major only)
Use case fit:
- use_case given → arch natural fit?
- Arch mismatches → workarounds req?
Interop:
- agentskills.io compat (skill model)
- MCP (tool integration)
- A2A (agent-to-agent)

→ Arch report: ext pts, lock-in (low/med/high), API stability, use-case fit.

If err: sparse docs → derive from code + public API. Too young for stability hist → note, weight gov more.

Step 5: Governance + Sustainability

Gov model → long-term viable? Fair to externals?

Gov model:
- BDFL: single decider → fast, bus factor risk
- Committee/Core team: distributed → slower, resilient
- Foundation-backed: Apache, Linux Foundation, CNCF → most sustainable
- Corporate-controlled: one co → rug-pull risk
Funding:
- VC, corp, grants, community, unfunded
- Full-time maintainers >=2 healthy; 0 red flag
- Revenue → how sustain?
Contributor protections:
- License: permissive (MIT, Apache-2.0) vs copyleft (GPL) vs custom
- CLA → rights transfer that disadvantage?
- Recog → credited in releases/changelogs/docs?
Security:
- SECURITY.md or equiv
- Median CVE → patch time
- Dep update (Dependabot, Renovate, manual)
Trajectory:
- Gov evolving (→ foundation)?
- Recent leadership/acq/relicense?
- Public maintainer-contributor conflicts?

→ Gov assess: model, sustainability (sustainable/at-risk/critical), protections, security.

If err: gov undocumented → absence = yellow flag. Check implicit: who merges, who closes, who releases.

Step 6: Classify

Synth → 4-tier + justifications + recs.

Score each (1-5):
- Community health: survival, resp, diversity
- Supersession risk: rate, roadmap, traps (invert: low better)
- Arch alignment: ext pts, lock-in, stability, fit
- Gov sustainability: model, funding, protections, sec
Thresholds:
- INVEST (all >=4): healthy, low supersession (<20%), aligned, sustainable gov → safe adopt + contrib
- EVALUATE-FURTHER (mixed, none <2): mixed signals → specific follow-ups, re-eval date
- CONTRIBUTE-CAUTIOUSLY (any 2, none <2): high supersession (>40%) or gov concerns → limit to requested work, maintainer-approved scope, plugin/ext decoupled from core
- AVOID (any 1): crit red flags — abandoned, hostile (<15% survival), bad license, rug-pull → no eng effort
Write report:
- Tier + 1-sentence rationale up front
- Each dim score + evidence
- contribution_budget given → how alloc hrs per tier
- EVALUATE-FURTHER → specific Qs + timeline
- CONTRIBUTE-CAUTIOUSLY → safe (plugins, docs, tests) vs risky (core)
comparison_frameworks evaluated → cmp matrix, rank all

→ Classification report: tier, scores, evidence, actionable recs.

If err: data gaps block confident call → default EVALUATE-FURTHER, doc missing data + how to get. Never default INVEST when unsure.

Chk

Census: stars, forks, deps, cadence, bus factor, landscape
Community: survival, resp times, diversity, gov artifacts
Supersession: rate + breakdown (reverted/rewritten/obsoleted)
Arch: ext pts, lock-in, API stability, fit
Gov: model, funding, protections, security
Tier: INVEST / EVALUATE-FURTHER / CONTRIBUTE-CAUTIOUSLY / AVOID
Each score → specific evidence
Recs actionable + calib to budget (if given)
Data gaps + confidence limits doc'd

Traps

Popularity ≠ health: 50k stars + 1 maintainer < 2k stars + 15 active contribs. SPoF.
Skip supersession: most common ext-contrib failure. Welcoming community worthless if int overwrites ext.
Arch-only, ignore gov: pretty design fails w/ unsustainable or hostile gov.
EVALUATE-FURTHER ≠ AVOID: mixed = investigate, not reject. Set re-eval date + specific Qs.
Snapshot bias: metrics point-in-time. Declining proj w/ great current > improving proj w/ mediocre. Check 6-12mo trend.
CLA complacency: some CLAs transfer copyright → your work = their asset. Read text, not checkbox.
Single-framework anchor: no cmp → anything looks great/terrible. Bench at least 1 alt, even informal.

See

polish-claw-project — contrib workflow this informs
review-software-architecture — Step 4 arch eval
forage-solutions — alt framework discovery for cmp
search-prior-art — landscape + prior work
security-audit-codebase — Step 5 sec posture
assess-ip-landscape — license + IP risk

GitHub 저장소

pjt222/agent-almanac

경로: i18n/caveman-ultra/skills/evaluate-agent-framework

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the evaluate-agent-framework skill?

evaluate-agent-framework is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform evaluate-agent-framework-related tasks without extra prompting.

How do I install evaluate-agent-framework?

Use the install commands on this page: add evaluate-agent-framework to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does evaluate-agent-framework belong to?

evaluate-agent-framework is in the Design category, tagged ai and design.

Is evaluate-agent-framework free to use?

Yes. evaluate-agent-framework is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

executing-plans

디자인

executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.

스킬 보기

requesting-code-review

디자인

이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.

스킬 보기

connect-mcp-server

디자인

이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.

스킬 보기

web-cli-teleport

디자인

이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.

스킬 보기