test-a2a-interop
Über
Diese Fähigkeit testet die A2A (Agent-zu-Agent)-Interoperabilität durch die Validierung der Protokollkonformität, einschließlich Agent Card-Erkennung, Aufgabenlebenszyklus-Status und Streaming. Nutzen Sie sie, um neue A2A-Server-Implementierungen zu überprüfen, Konformitätstests in CI/CD auszuführen und Multi-Agent-Workflows vor der Bereitstellung zu debuggen. Sie wurde für Entwickler entwickelt, um zu zertifizieren, dass ein Agent der A2A-Protokollspezifikation entspricht.
Schnellinstallation
Claude Code
Empfohlennpx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/test-a2a-interopKopieren Sie diesen Befehl und fügen Sie ihn in Claude Code ein, um diese Fähigkeit zu installieren
Dokumentation
Test A2A Interop
Validate A2A agent conforms to spec: Agent Card discovery, task lifecycle, SSE streaming, err handling, multi-agent comm patterns.
Use When
- Verify new A2A server impl before deploy
- Validate interop between ≥2 A2A agents
- Conformance tests in CI/CD for A2A services
- Debug fails in multi-agent A2A workflows
- Certify agent meets A2A protocol for registry
In
- Required: Base URL of agent under test
- Required: Auth creds (if needed)
- Optional: 2nd agent URL for bidirectional interop
- Optional: Specific skills to test (default: all in Card)
- Optional: Test timeout per task (default 60s)
- Optional: Output format (
json,markdown,junit)
Do
Step 1: Fetch + Validate Agent Cards
1.1. Retrieve Card from well-known endpoint:
curl -s https://agent.example.com/.well-known/agent.json -o agent-card.json
1.2. Validate top-level required:
const requiredFields = ["name", "description", "url", "skills"];
for (const field of requiredFields) {
assert(agentCard[field] !== undefined, `Missing required field: ${field}`);
}
1.3. Validate each skill entry:
for (const skill of agentCard.skills) {
assert(skill.id, "Skill missing id");
assert(skill.name, "Skill missing name");
assert(skill.description, "Skill missing description");
assert(
Array.isArray(skill.inputModes) && skill.inputModes.length > 0,
`Skill ${skill.id} missing inputModes`
);
assert(
Array.isArray(skill.outputModes) && skill.outputModes.length > 0,
`Skill ${skill.id} missing outputModes`
);
}
1.4. Validate auth config:
authentication.schemesincludesoauth2→ verifycredentials.oauth2hastokenUrl- includes
apiKey→ verifycredentials.apiKeyhasheaderName
1.5. Validate capability flags = boolean.
1.6. Record validation results in conformance report:
interface ConformanceResult {
test: string;
category: "agent-card" | "lifecycle" | "streaming" | "error-handling" | "interop";
status: "pass" | "fail" | "skip";
message?: string;
duration_ms?: number;
}
Got: Card passes all structural validation.
If err: Record each fail w/ specific field + reason. Don't abort; continue. Invalid Card = test result.
Step 2: Test Tasks Covering Lifecycle States
2.1. Submission (submitted → working → completed)
Send task agent should handle per declared skills:
const submitResult = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 1,
method: "tasks/send",
params: {
id: `test-${uuid()}`,
sessionId: `session-${uuid()}`,
message: {
role: "user",
parts: [{ type: "text", text: skillExamples[0] }],
},
},
});
assert(submitResult.result, "tasks/send should return a result");
assert(submitResult.result.id, "Result should include task ID");
assert(
["submitted", "working", "completed"].includes(submitResult.result.status.state),
`Unexpected initial state: ${submitResult.result.status.state}`
);
2.2. Polling (tasks/get)
Poll until terminal:
let task = submitResult.result;
const startTime = Date.now();
while (!["completed", "failed", "canceled"].includes(task.status.state)) {
if (Date.now() - startTime > TEST_TIMEOUT_MS) {
fail(`Task ${task.id} did not complete within ${TEST_TIMEOUT_MS}ms`);
break;
}
await sleep(1000);
const getResult = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 2,
method: "tasks/get",
params: { id: task.id },
});
task = getResult.result;
}
assert(task.status.state === "completed", `Task should complete, got: ${task.status.state}`);
2.3. Cancellation
Submit + immediately cancel:
const cancelTask = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 3,
method: "tasks/send",
params: { id: `test-cancel-${uuid()}`, sessionId: `session-${uuid()}`, message: { ... } },
});
const cancelResult = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 4,
method: "tasks/cancel",
params: { id: cancelTask.result.id },
});
assert(
cancelResult.result.status.state === "canceled",
"Canceled task should be in canceled state"
);
2.4. Input-required (multi-turn)
Skill supports multi-turn → ambiguous req → triggers input-required, then follow-up:
// Send ambiguous request
const multiTurnTask = await sendJsonRpc(agentUrl, { ... });
// Poll until input-required or completed
// If input-required, send follow-up
if (task.status.state === "input-required") {
const followUp = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 6,
method: "tasks/send",
params: {
id: task.id,
sessionId: task.sessionId,
message: { role: "user", parts: [{ type: "text", text: "Column A and Column B" }] },
},
});
assert(
["working", "completed"].includes(followUp.result.status.state),
"Follow-up should resume task"
);
}
2.5. State transition history
Card declares stateTransitionHistory: true:
const getWithHistory = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 7,
method: "tasks/get",
params: { id: completedTaskId, historyLength: 100 },
});
assert(
Array.isArray(getWithHistory.result.history),
"Task should include history array"
);
assert(
getWithHistory.result.history.length >= 2,
"History should have at least 2 entries (submitted and completed)"
);
Got: All lifecycle transitions work. Tasks complete, cancel cleanly, multi-turn fns when supported.
If err: Record specific transition fail, expected vs actual. Include full JSON-RPC res in report.
Step 3: Validate SSE Streaming
3.1. Skip if streaming: false.
3.2. Send tasks/sendSubscribe + validate SSE stream:
const response = await fetch(`${agentUrl}/subscribe`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
jsonrpc: "2.0",
id: 10,
method: "tasks/sendSubscribe",
params: {
id: `test-stream-${uuid()}`,
sessionId: `session-${uuid()}`,
message: { role: "user", parts: [{ type: "text", text: "Stream test task" }] },
},
}),
});
assert(
response.headers.get("content-type")?.includes("text/event-stream"),
"Response must be text/event-stream"
);
3.3. Parse SSE events + validate structure:
const events: SSEEvent[] = [];
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// Parse SSE events from buffer
const lines = buffer.split("\n");
for (const line of lines) {
if (line.startsWith("event: ")) {
currentEvent.type = line.slice(7);
} else if (line.startsWith("data: ")) {
currentEvent.data = JSON.parse(line.slice(6));
events.push(currentEvent);
}
}
}
3.4. Validate event sequence:
- 1st =
statusevent w/submitted|working - Intermediate may include
statusupdates +artifactdeliveries - Final has
final: true+ terminal state - No events after final
3.5. Validate cleanup:
- Close connection mid-stream
- Verify task retrievable via
tasks/get - Verify no server errs from premature disconnect
Got: SSE delivers correctly formatted events in right sequence, ending w/ final terminal event.
If err: SSE advertised but endpoint returns non-SSE → conformance fail. Events out of order → record sequence. Stream never terminates → record timeout.
Step 4: Test Err Handling + Edge Cases
4.1. Unknown method
const unknownMethod = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 20,
method: "tasks/nonexistent",
params: {},
});
assert(unknownMethod.error?.code === -32601, "Should return method not found");
4.2. Malformed JSON-RPC
const malformed = await fetch(agentUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: '{"not": "valid jsonrpc"}',
});
const response = await malformed.json();
assert(response.error?.code === -32600, "Should return invalid request");
4.3. Get nonexistent task
const notFound = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 22,
method: "tasks/get",
params: { id: "nonexistent-task-id" },
});
assert(notFound.error, "Should return error for nonexistent task");
4.4. Cancel completed task
const cancelCompleted = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 23,
method: "tasks/cancel",
params: { id: completedTaskId },
});
assert(cancelCompleted.error, "Should error when canceling completed task");
4.5. Auth enforcement
Auth configured → req w/o creds:
const unauthResponse = await fetch(agentUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ jsonrpc: "2.0", id: 24, method: "tasks/get", params: { id: "x" } }),
});
assert(unauthResponse.status === 401, "Should reject unauthenticated requests");
4.6. Card publicly accessible w/o auth
const publicCard = await fetch(`${agentUrl}/.well-known/agent.json`);
assert(publicCard.status === 200, "Agent Card should be publicly accessible");
Got: All err conditions return appropriate JSON-RPC err codes w/o crashing.
If err: Record each err handling test fail. Server crashes during err testing = critical, must fix before deploy.
Step 5: Generate Conformance Report
5.1. Aggregate test results → structured report:
interface ConformanceReport {
agentUrl: string;
agentName: string;
agentVersion: string;
testDate: string;
summary: {
total: number;
passed: number;
failed: number;
skipped: number;
};
categories: {
agentCard: ConformanceResult[];
lifecycle: ConformanceResult[];
streaming: ConformanceResult[];
errorHandling: ConformanceResult[];
interop: ConformanceResult[];
};
conformanceLevel: "full" | "partial" | "minimal" | "non-conformant";
}
5.2. Calculate conformance level:
- full: All pass, including streaming + push notifications
- partial: Core lifecycle pass, some optional fail
- minimal: Card valid + basic send/get works
- non-conformant: Card invalid | basic lifecycle broken
5.3. Generate report in requested format:
- json: Machine-readable for CI/CD
- markdown: Human-readable w/ pass/fail tables
- junit: XML for test framework integration
5.4. Include recommendations:
## Failed Tests
| Test | Category | Message | Recommendation |
|------|----------|---------|----------------|
| cancel-completed-task | error-handling | Server returned 500 | Add guard for terminal state transitions |
| sse-final-event | streaming | No final event received | Ensure SSE sends event with final:true |
5.5. Bidirectional testing requested (2 agents):
- A can discover B's Card
- A can send task to B
- B can send task to A
- Both handle concurrent tasks w/o interference
Got: Complete conformance report w/ pass/fail, level, actionable recommendations.
If err: Report gen fails → output raw test results to stdout fallback. Test data never lost due to reporting err.
Check
- Card fetched + structurally validated
- ≥1 task completes full lifecycle (submitted → working → completed)
- Cancellation works
- Err responses use correct JSON-RPC codes
- SSE tested if advertised
- Auth enforced on task endpoints, NOT on Card
- Conformance report generated in requested format
- Fails include actionable remediation
- Suite runnable in CI/CD w/o manual
Traps
- Cold server: Some agents take init time. Add health check | warmup before tests.
- Hardcoded test data: Use dynamic task + session IDs (UUIDs). Never assume specific task ID avail.
- Ignore timing: Transitions async. Always poll w/ backoff vs immediate state assertion.
- SSE parsing complexity: Events may span multiple chunks. Buffer incoming + parse complete events.
- Only happy path: Err handling tests as important as success. Malformed reqs, invalid transitions, auth fails all covered.
- Network dependency: Runnable vs localhost (dev) + remote (prod). Parameterize URL.
- Assume skill behavior: Suite validates protocol conformance, not skill correctness. Use example phrases from Card to trigger, don't assert specific output.
→
design-a2a-agent-card— design Card being testedimplement-a2a-server— implement server being testedbuild-ci-cd-pipeline— integrate into CI/CDtroubleshoot-mcp-connection— debugging patterns for A2A connectivityreview-software-architecture— arch review for multi-agent systems
GitHub Repository
Verwandte Skills
evaluating-llms-harness
TestenDiese Claude Skill führt den lm-evaluation-harness aus, um LLMs über 60+ standardisierte akademische Aufgaben wie MMLU und GSM8K zu benchmarken. Sie wurde für Entwickler entwickelt, um Modellqualität zu vergleichen, Trainingsfortschritt zu verfolgen oder akademische Ergebnisse zu berichten. Das Tool unterstützt verschiedene Backends, einschließlich HuggingFace- und vLLM-Modelle.
cloudflare-cron-triggers
TestenDiese Fähigkeit bietet umfassendes Wissen zur Implementierung von Cloudflare Cron Triggers, um Workers mithilfe von Cron-Ausdrücken zu planen. Sie behandelt das Einrichten periodischer Aufgaben, Wartungsjobs und automatisierter Workflows, während häufige Probleme wie ungültige Cron-Ausdrücke und Zeitzonenprobleme behandelt werden. Entwickler können sie zum Konfigurieren geplanter Handler, zum Testen von Cron-Triggers und zur Integration mit Workflows und Green Compute verwenden.
webapp-testing
TestenDiese Claude Skill bietet ein Playwright-basiertes Toolkit zum Testen lokaler Webanwendungen durch Python-Skripte. Es ermöglicht Frontend-Verifizierung, UI-Debugging, Screenshot-Aufnahme und Log-Einblick bei gleichzeitiger Verwaltung von Server-Lebenszyklen. Nutzen Sie es für Browser-Automatisierungsaufgaben, führen Sie Skripte jedoch direkt aus, anstatt deren Quellcode zu lesen, um Kontextverschmutzung zu vermeiden.
finishing-a-development-branch
TestenDiese Fähigkeit unterstützt Entwickler dabei, abgeschlossene Arbeiten zu finalisieren, indem sie testet, ob Tests bestehen, und dann strukturierte Integrationsoptionen präsentiert. Sie leitet den Workflow für das Zusammenführen von Code, das Erstellen von PRs oder das Bereinigen von Branches nach Abschluss der Implementierung. Nutzen Sie sie, wenn Ihr Code bereit und getestet ist, um den Entwicklungsprozess systematisch abzuschließen.
