SKILL·C755CE

test-team-coordination

Name: test-team-coordination
Author: pjt222

pjt222

Mis à jour 1 month ago

9 vues

Teststestingdesign

À propos

Cette compétence exécute des scénarios de test sur une équipe pour valider et observer ses modèles de coordination, en les évaluant par rapport à des critères d'acceptation. Elle génère un rapport structuré `RESULT.md` pour comparer les performances sur des charges de travail équivalentes et établir une référence. Utilisez-la pour vérifier que la collaboration d'une équipe produit les comportements attendus lors de tâches réalistes.

Installation rapide

Claude Code

Recommandé

Principal

npx skills add pjt222/agent-almanac -a claude-code

Commande PluginAlternatif

/plugin add https://github.com/pjt222/agent-almanac

Git CloneAlternatif

git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/test-team-coordination

Copiez et collez cette commande dans Claude Code pour installer cette compétence

Documentation

Test Team Coordination

Exec test scenario from tests/scenarios/teams/ vs target team. Observe coordination pattern behaviors, eval acceptance criteria, score rubric, produce RESULT.md in tests/results/.

Use When

Validate team's coordination produces expected behaviors
Run structured test after modifying team def | agent
Compare patterns by running same scenario w/ diff teams
Establish baseline perf metrics for team composition
Regression tests after adding agents | changing membership

In

Required: Path to test scenario file (e.g. tests/scenarios/teams/test-opaque-team-cartographers-audit.md)
Optional: Run ID override (default: YYYY-MM-DD-<target>-NNN auto)
Optional: Team size override (default: from scenario frontmatter)
Optional: Skip scope change (default: false — inject if defined)

Do

Step 1: Load + Validate Scenario

1.1. Read scenario file specified in input.

1.2. Parse YAML frontmatter + extract:

target — team to test
coordination-pattern — expected pattern
team-size — # members to spawn
Acceptance criteria table
Scoring rubric (if present)
Ground truth data (if present)

1.3. Verify file has all req sections:

Objective
Pre-conditions
Task (w/ Primary Task subsection)
Expected Behaviors
Acceptance Criteria
Observation Protocol

Got: Scenario loads, parses, has all req sections.

If err: Missing | unparseable → abort w/ err msg ID'ing missing/malformed. Optional sections (Rubric, Ground Truth, Variants) absent → note + continue.

Step 2: Verify Pre-conditions

2.1. Walk through each pre-condition checkbox.

2.2. File-existence → use Glob.

2.3. Registry count → parse _registry.yml + cmp total_* vs actual file counts.

2.4. Branch/git → git status --porcelain + git branch --show-current.

Got: All pre-conditions satisfied.

If err: Pre-condition fails → record BLOCKED. Decide: proceed (soft) | abort (hard like missing target team file). Doc decision.

Step 3: Load Coordination Pattern Criteria

3.1. Read tests/_registry.yml + locate coordination_patterns matching scenario's coordination-pattern.

3.2. Extract key_behaviors list.

3.3. Behaviors = observation checklist — each watched during exec + recorded observed/not.

Got: Pattern key behaviors loaded for observation.

If err: Pattern not in registry → use scenario's Expected Behaviors as sole source. Log warning.

Step 4: Execute Task

4.1. Create result dir: tests/results/YYYY-MM-DD-<target>-NNN/.

4.2. Record T0 (task start).

4.3. Read target team def from teams/<target>.md, extract CONFIG block, activate: call TeamCreate w/ team name, spawn teammates per subagent_type, create tasks from CONFIG tasks list. Use team-size from scenario. Pass Primary Task verbatim from scenario's Task section.

4.4. Observe team's exec phases. Record:

T1: Form assessment / decomposition complete
T2: Role assignments visible

4.5. Scenario defines Scope Change Trigger + skip-scope-change false:

Wait until Phase 2 (role assignment) visible
T3 (scope change injection)
Send scope change prompt via SendMessage
T4 (scope change absorbed — role adjustment visible)

4.6. Continue observing until output:

T5 (integration begins)
T6 (final report delivered)

4.7. Capture team's complete output.

Got: Team executes through coordination phases. Timestamps for all transitions. Scope change (if applicable) injected + absorbed.

If err: Team fails to produce output → record fail point + err msgs. Stalls → note last phase + timeout. Proceed to eval w/ partial.

Step 5: Evaluate Pattern Behaviors

5.1. Per key behavior from Step 3, determine observed during exec:

Observed: Clear evidence in output | coordination
Partial: Some evidence but incomplete | ambiguous
Not observed: No evidence

5.2. Per task-specific behavior from scenario's Expected Behaviors, same eval.

5.3. Record findings in observation log.

Got: All/most pattern + task behaviors observed.

If err: Unobserved = findings, not test fails. Record accurate — pattern didn't fully manifest.

Step 6: Evaluate Acceptance Criteria

6.1. Walk each acceptance criterion.

6.2. Per criterion, determination:

PASS: Clearly met w/ observable evidence
PARTIAL: Partially met (counts toward threshold at 0.5 weight)
FAIL: Not met despite opportunity
BLOCKED: Couldn't eval (pre-condition fail, timeout)

6.3. Scenario has Ground Truth → verify findings vs:

Calc accuracy % per category
Flag false +/false -

6.4. Scenario has Scoring Rubric → score each dim 1-5 w/ brief justification.

6.5. Calc summary metrics:

Acceptance: X/N criteria passed (PARTIAL = 0.5)
Threshold: PASS if ≥ scenario threshold
Rubric total: X/Y points (if applicable)

Got: All criteria have determination. Summary metrics calc'd.

If err: < half criteria evaluable (too many BLOCKED) → inconclusive. Doc why + recommend re-run after fixing pre-conditions.

Step 7: Generate RESULT.md

7.1. Create tests/results/YYYY-MM-DD-<target>-NNN/RESULT.md using Recording Template from scenario's Observation Protocol.

7.2. Populate all sections:

Run metadata (observer, timestamps, duration)
Phase log w/ all timestamps
Role emergence log (adaptive/team tests)
Acceptance criteria results table
Rubric scores table (if applicable)
Ground truth verification table (if applicable)
Key observations (narrative)
Lessons learned

7.3. Include team's raw output as appendix | separate file (team-output.md) in same dir.

7.4. Add summary verdict at top:

**Verdict**: PASS | FAIL | INCONCLUSIVE
**Score**: X/N criteria (Y/Z rubric points)
**Duration**: Xm

Got: Complete RESULT.md w/ all sections + clear verdict.

If err: Result file can't be written → output to stdout fallback. Eval data never lost.

Check

Traps

Eval output quality vs coordination: Tests how team coordinates, not whether output perfect. Team coordinating well but finding 7/9 broken refs still demonstrates pattern.
Inject scope change too early: Wait until role assignment clearly visible. Too early → team hasn't differentiated, nothing to adapt.
Conflate member output w/ team output: Opaque team should present unified output. Individual member reports = finding about opacity, not test infra problem.
Exact ground truth matching: Ground truth counts approximate. Eval right ballpark, not exact match.
Forget timestamps: Essential for phase durations + adaptation speed. Set as events happen, not retroactively.

→

review-codebase — deep codebase review complementing team-level testing
review-skill-format — validates individual skill format (this validates team coordination)
create-team — creates defs this tests
evolve-team — evolves defs based on test findings
test-a2a-interop — similar testing pattern for A2A protocol conformance
assess-form — morphic assessment opaque team lead uses internally

Dépôt GitHub

pjt222/agent-almanac

Chemin: i18n/caveman-ultra/skills/test-team-coordination

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the test-team-coordination skill?

test-team-coordination is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform test-team-coordination-related tasks without extra prompting.

How do I install test-team-coordination?

Use the install commands on this page: add test-team-coordination to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does test-team-coordination belong to?

test-team-coordination is in the Testing category, tagged testing and design.

Is test-team-coordination free to use?

Yes. test-team-coordination is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Compétences associées

evaluating-llms-harness

Tests

Cette compétence Claude exécute le lm-evaluation-harness pour évaluer les modèles de langage sur plus de 60 tâches académiques standardisées telles que MMLU et GSM8K. Elle est conçue pour permettre aux développeurs de comparer la qualité des modèles, de suivre les progrès de l'entraînement ou de rapporter des résultats académiques. L'outil prend en charge différents backends, incluant les modèles HuggingFace et vLLM.

Voir la compétence

cloudflare-cron-triggers

Tests

Cette compétence fournit une connaissance complète pour la mise en œuvre de Déclencheurs Cron Cloudflare afin de planifier des Workers à l'aide d'expressions cron. Elle couvre la configuration de tâches périodiques, de travaux de maintenance et de flux de travail automatisés, tout en traitant des problèmes courants tels que les expressions cron non valides et les problèmes de fuseau horaire. Les développeurs peuvent l'utiliser pour configurer des gestionnaires planifiés, tester des déclencheurs cron et intégrer avec Workflows et Green Compute.

Voir la compétence

webapp-testing

Tests

Cette Compétence Claude fournit une boîte à outils basée sur Playwright pour tester des applications web locales via des scripts Python. Elle permet la vérification frontend, le débogage d'interface utilisateur, la capture d'écrans et la consultation des journaux, tout en gérant les cycles de vie du serveur. Utilisez-la pour les tâches d'automatisation de navigateur, mais exécutez les scripts directement plutôt que de lire leur code source pour éviter la pollution du contexte.

Voir la compétence

finishing-a-development-branch

Tests

Cette compétence aide les développeurs à finaliser leur travail en vérifiant que les tests passent, puis en présentant des options d'intégration structurées. Elle guide le processus de fusion, de création de PRs ou de nettoyage des branches une fois l'implémentation terminée. Utilisez-la lorsque votre code est prêt et testé pour finaliser systématiquement le cycle de développement.

Voir la compétence