test-a2a-interop
정보
이 스킬은 프로토콜 준수 검증, 작업 생명주기 실행, 스트리밍 및 오류 처리 확인을 통해 A2A 에이전트 상호운용성을 테스트합니다. 새로운 A2A 서버 구현 검증, 다중 에이전트 워크플로우 디버깅, CI/CD 파이프라인에서의 적합성 테스트 실행에 사용하세요.
빠른 설치
Claude Code
추천npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/test-a2a-interopClaude Code에서 이 명령을 복사하여 붙여넣어 스킬을 설치하세요
문서
Test A2A Interoperability
Validate A2A agent implementation conforms to protocol specification. Test Agent Card discovery, task lifecycle management, SSE streaming, error handling, multi-agent communication patterns.
When Use
- Verify new A2A server implementation before deployment
- Validate interoperability between two or more A2A agents
- Run conformance tests as part of CI/CD for A2A services
- Debug failures in multi-agent A2A workflows
- Certify agent meets A2A protocol requirements for registry
Inputs
- Required: Base URL of A2A agent under test
- Required: Authentication credentials (if agent requires them)
- Optional: Second agent URL for bidirectional interop testing
- Optional: Specific skills to test (default: all skills in Agent Card)
- Optional: Test timeout per task (default: 60 seconds)
- Optional: Output format for conformance report (
json,markdown,junit)
Steps
Step 1: Fetch and Validate Agent Cards
1.1. Retrieve the Agent Card from the well-known endpoint:
curl -s https://agent.example.com/.well-known/agent.json -o agent-card.json
1.2. Validate required top-level fields:
const requiredFields = ["name", "description", "url", "skills"];
for (const field of requiredFields) {
assert(agentCard[field] !== undefined, `Missing required field: ${field}`);
}
1.3. Validate each skill entry:
for (const skill of agentCard.skills) {
assert(skill.id, "Skill missing id");
assert(skill.name, "Skill missing name");
assert(skill.description, "Skill missing description");
assert(
Array.isArray(skill.inputModes) && skill.inputModes.length > 0,
`Skill ${skill.id} missing inputModes`
);
assert(
Array.isArray(skill.outputModes) && skill.outputModes.length > 0,
`Skill ${skill.id} missing outputModes`
);
}
1.4. Validate authentication configuration:
- If
authentication.schemesincludesoauth2, verifycredentials.oauth2hastokenUrl - If
authentication.schemesincludesapiKey, verifycredentials.apiKeyhasheaderName
1.5. Validate capability flags are boolean values.
1.6. Record validation results in the conformance report:
interface ConformanceResult {
test: string;
category: "agent-card" | "lifecycle" | "streaming" | "error-handling" | "interop";
status: "pass" | "fail" | "skip";
message?: string;
duration_ms?: number;
}
Got: Agent Card passes all structural validation checks.
If fail: Record each validation failure with specific field and reason. Never abort; continue testing other aspects. Invalid Agent Card itself a test result.
Step 2: Send Test Tasks Covering All Lifecycle States
2.1. Test: Task submission (submitted -> working -> completed)
Send a task that the agent should be able to handle based on its declared skills:
const submitResult = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 1,
method: "tasks/send",
params: {
id: `test-${uuid()}`,
sessionId: `session-${uuid()}`,
message: {
role: "user",
parts: [{ type: "text", text: skillExamples[0] }],
},
},
});
assert(submitResult.result, "tasks/send should return a result");
assert(submitResult.result.id, "Result should include task ID");
assert(
["submitted", "working", "completed"].includes(submitResult.result.status.state),
`Unexpected initial state: ${submitResult.result.status.state}`
);
2.2. Test: Task polling (tasks/get)
Poll until the task reaches a terminal state:
let task = submitResult.result;
const startTime = Date.now();
while (!["completed", "failed", "canceled"].includes(task.status.state)) {
if (Date.now() - startTime > TEST_TIMEOUT_MS) {
fail(`Task ${task.id} did not complete within ${TEST_TIMEOUT_MS}ms`);
break;
}
await sleep(1000);
const getResult = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 2,
method: "tasks/get",
params: { id: task.id },
});
task = getResult.result;
}
assert(task.status.state === "completed", `Task should complete, got: ${task.status.state}`);
2.3. Test: Task cancellation
Submit a task and immediately cancel it:
const cancelTask = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 3,
method: "tasks/send",
params: { id: `test-cancel-${uuid()}`, sessionId: `session-${uuid()}`, message: { ... } },
});
const cancelResult = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 4,
method: "tasks/cancel",
params: { id: cancelTask.result.id },
});
assert(
cancelResult.result.status.state === "canceled",
"Canceled task should be in canceled state"
);
2.4. Test: Input-required state (multi-turn)
If any skill supports multi-turn interaction, send an ambiguous request that should trigger input-required, then provide the follow-up:
// Send ambiguous request
const multiTurnTask = await sendJsonRpc(agentUrl, { ... });
// Poll until input-required or completed
// If input-required, send follow-up
if (task.status.state === "input-required") {
const followUp = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 6,
method: "tasks/send",
params: {
id: task.id,
sessionId: task.sessionId,
message: { role: "user", parts: [{ type: "text", text: "Column A and Column B" }] },
},
});
assert(
["working", "completed"].includes(followUp.result.status.state),
"Follow-up should resume task"
);
}
2.5. Test: State transition history
If the Agent Card declares stateTransitionHistory: true:
const getWithHistory = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 7,
method: "tasks/get",
params: { id: completedTaskId, historyLength: 100 },
});
assert(
Array.isArray(getWithHistory.result.history),
"Task should include history array"
);
assert(
getWithHistory.result.history.length >= 2,
"History should have at least 2 entries (submitted and completed)"
);
Got: All lifecycle state transitions work correct. Tasks complete success, cancel clean, multi-turn interaction functions when supported.
If fail: Record specific state transition that failed, expected state, actual state. Include full JSON-RPC response in report for debugging.
Step 3: Validate SSE Streaming Responses
3.1. Skip this step if the Agent Card declares streaming: false.
3.2. Send a tasks/sendSubscribe request and validate the SSE stream:
const response = await fetch(`${agentUrl}/subscribe`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
jsonrpc: "2.0",
id: 10,
method: "tasks/sendSubscribe",
params: {
id: `test-stream-${uuid()}`,
sessionId: `session-${uuid()}`,
message: { role: "user", parts: [{ type: "text", text: "Stream test task" }] },
},
}),
});
assert(
response.headers.get("content-type")?.includes("text/event-stream"),
"Response must be text/event-stream"
);
3.3. Parse SSE events and validate structure:
const events: SSEEvent[] = [];
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
// Parse SSE events from buffer
const lines = buffer.split("\n");
for (const line of lines) {
if (line.startsWith("event: ")) {
currentEvent.type = line.slice(7);
} else if (line.startsWith("data: ")) {
currentEvent.data = JSON.parse(line.slice(6));
events.push(currentEvent);
}
}
}
3.4. Validate the event sequence:
- First event should be a
statusevent with statesubmittedorworking - Intermediate events may include
statusupdates andartifactdeliveries - Final event should have
final: truewith a terminal state - No events should arrive after the final event
3.5. Validate that SSE connection cleanup works:
- Close the connection mid-stream
- Verify the task can still be retrieved via
tasks/get - Verify no server errors from the premature disconnect
Got: SSE stream delivers correct formatted events in right sequence, ending with final terminal event.
If fail: SSE advertised but endpoint returns non-SSE response? Record as conformance failure. Events arrive out of order? Record sequence. Stream never terminates? Record timeout.
Step 4: Test Error Handling and Edge Cases
4.1. Test: Unknown method
const unknownMethod = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 20,
method: "tasks/nonexistent",
params: {},
});
assert(unknownMethod.error?.code === -32601, "Should return method not found");
4.2. Test: Malformed JSON-RPC request
const malformed = await fetch(agentUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: '{"not": "valid jsonrpc"}',
});
const response = await malformed.json();
assert(response.error?.code === -32600, "Should return invalid request");
4.3. Test: Get nonexistent task
const notFound = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 22,
method: "tasks/get",
params: { id: "nonexistent-task-id" },
});
assert(notFound.error, "Should return error for nonexistent task");
4.4. Test: Cancel already completed task
const cancelCompleted = await sendJsonRpc(agentUrl, {
jsonrpc: "2.0",
id: 23,
method: "tasks/cancel",
params: { id: completedTaskId },
});
assert(cancelCompleted.error, "Should error when canceling completed task");
4.5. Test: Authentication enforcement
If authentication is configured, send a request without credentials:
const unauthResponse = await fetch(agentUrl, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ jsonrpc: "2.0", id: 24, method: "tasks/get", params: { id: "x" } }),
});
assert(unauthResponse.status === 401, "Should reject unauthenticated requests");
4.6. Test: Agent Card is publicly accessible without auth
const publicCard = await fetch(`${agentUrl}/.well-known/agent.json`);
assert(publicCard.status === 200, "Agent Card should be publicly accessible");
Got: All error conditions return appropriate JSON-RPC error codes without crashing server.
If fail: Record each error handling test that fails. Server crashes during error testing critical failures must be fixed before deployment.
Step 5: Generate Interoperability Conformance Report
5.1. Aggregate all test results into a structured report:
interface ConformanceReport {
agentUrl: string;
agentName: string;
agentVersion: string;
testDate: string;
summary: {
total: number;
passed: number;
failed: number;
skipped: number;
};
categories: {
agentCard: ConformanceResult[];
lifecycle: ConformanceResult[];
streaming: ConformanceResult[];
errorHandling: ConformanceResult[];
interop: ConformanceResult[];
};
conformanceLevel: "full" | "partial" | "minimal" | "non-conformant";
}
5.2. Calculate the conformance level:
- full: All tests pass, including streaming and push notifications
- partial: Core lifecycle tests pass, some optional features fail
- minimal: Agent Card valid and basic task send/get works
- non-conformant: Agent Card invalid or basic lifecycle broken
5.3. Generate the report in the requested format:
- json: Machine-readable for CI/CD integration
- markdown: Human-readable with pass/fail tables
- junit: XML format for test framework integration
5.4. Include recommendations for fixing failures:
## Failed Tests
| Test | Category | Message | Recommendation |
|------|----------|---------|----------------|
| cancel-completed-task | error-handling | Server returned 500 | Add guard for terminal state transitions |
| sse-final-event | streaming | No final event received | Ensure SSE sends event with final:true |
5.5. If bidirectional testing was requested (two agents), validate:
- Agent A can discover Agent B's Agent Card
- Agent A can send a task to Agent B
- Agent B can send a task to Agent A
- Both agents handle concurrent tasks without interference
Got: Complete conformance report with pass/fail results, conformance level, actionable recommendations.
If fail: Report generation itself fails? Output raw test results to stdout as fallback. Test data should never be lost due to reporting error.
Checks
- Agent Card fetched and structural validated
- At least one task completes full lifecycle (submitted -> working -> completed)
- Task cancellation works correct
- Error responses use correct JSON-RPC error codes
- SSE streaming tested if advertised in capabilities
- Authentication enforced on task endpoints but not on Agent Card
- Conformance report generated in requested format
- Failed tests include actionable remediation guidance
- Test suite can run in CI/CD without manual intervention
Pitfalls
- Test against cold server: Some agents take time to initialize. Add health check or warmup request before running tests.
- Hardcoded test data: Use dynamic task and session IDs (UUIDs) to avoid collisions when running tests repeated. Never assume specific task ID available.
- Ignore timing: Task transitions asynchronous. Always poll with backoff rather than asserting immediate state changes.
- SSE parsing complexity: SSE events may span multiple chunks. Buffer incoming data, parse complete events, not raw chunks.
- Test only happy path: Error handling tests as important as success tests. Malformed requests, invalid transitions, auth failures must all be covered.
- Network dependency: Tests should be runnable against localhost for development and remote URLs for production. Parameterize agent URL.
- Assume skill behavior: Test suite validates protocol conformance, not skill correctness. Use example phrases from Agent Card to trigger skills, never assert specific output content.
See Also
design-a2a-agent-card- design Agent Card being testedimplement-a2a-server- implement server being testedbuild-ci-cd-pipeline- integrate conformance tests into CI/CDtroubleshoot-mcp-connection- debugging patterns applicable to A2A connectivityreview-software-architecture- architecture review for multi-agent systems
GitHub 저장소
연관 스킬
evaluating-llms-harness
테스팅이 Claude Skill은 MMLU, GSM8K를 포함한 60개 이상의 표준화된 학술 과제에서 LLM 성능을 벤치마크하기 위해 lm-evaluation-harness를 실행합니다. 개발자들이 모델 품질을 비교하고, 학습 진행 상황을 추적하거나 학술 결과를 보고할 수 있도록 설계되었습니다. 이 도구는 HuggingFace와 vLLM 모델을 포함한 다양한 백엔드를 지원합니다.
cloudflare-cron-triggers
테스팅이 스킬은 cron 표현식을 사용하여 Worker를 스케줄링하기 위한 Cloudflare Cron Triggers 구현에 관한 포괄적인 지식을 제공합니다. 주기적 작업, 유지보수 작업, 자동화된 워크플로우 설정 방법을 다루며, 잘못된 cron 표현식이나 시간대 문제 같은 일반적인 이슈들을 해결하는 방법을 포함합니다. 개발자들은 이를 통해 스케줄된 핸들러 구성, cron 트리거 테스트, Workflows 및 Green Compute와의 연동 작업을 수행할 수 있습니다.
webapp-testing
테스팅이 Claude Skill은 Python 스크립트를 통해 로컬 웹 애플리케이션을 테스트하기 위한 Playwright 기반 툴킷을 제공합니다. 프론트엔드 검증, UI 디버깅, 스크린샷 캡처, 로그 확인 기능을 지원하며 서버 라이프사이클을 관리합니다. 브라우저 자동화 작업에 사용하되 컨텍스트 오염을 방지하기 위해 소스 코드를 읽지 않고 스크립트를 직접 실행하세요.
finishing-a-development-branch
테스팅이 스킬은 테스트 통과를 확인한 후 체계적인 통합 옵션을 제시하여 개발자가 완성된 작업을 마무리하도록 돕습니다. 구현이 완료된 후 머지, PR 생성, 브랜치 정리와 같은 워크플로우를 안내합니다. 코드가 준비되고 테스트가 완료되었을 때 개발 프로세스를 체계적으로 마무리하기 위해 사용하세요.
