SKILL·D4E528

test-a2a-interop

Name: test-a2a-interop
Author: pjt222

pjt222

Updated 1 month ago

9 views

Testingaitestingautomationdesign

About

This skill tests A2A agent interoperability by validating protocol conformance, exercising task lifecycles, and verifying streaming and error handling. Use it for validating new A2A server implementations, debugging multi-agent workflows, or running conformance tests in CI/CD pipelines.

Quick Install

Claude Code

Recommended

Primary

npx skills add pjt222/agent-almanac -a claude-code

Plugin CommandAlternative

/plugin add https://github.com/pjt222/agent-almanac

Git CloneAlternative

git clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/test-a2a-interop

Copy and paste this command in Claude Code to install this skill

Documentation

Test A2A Interoperability

Validate A2A agent implementation conforms to protocol specification. Test Agent Card discovery, task lifecycle management, SSE streaming, error handling, multi-agent communication patterns.

When Use

Verify new A2A server implementation before deployment
Validate interoperability between two or more A2A agents
Run conformance tests as part of CI/CD for A2A services
Debug failures in multi-agent A2A workflows
Certify agent meets A2A protocol requirements for registry

Inputs

Required: Base URL of A2A agent under test
Required: Authentication credentials (if agent requires them)
Optional: Second agent URL for bidirectional interop testing
Optional: Specific skills to test (default: all skills in Agent Card)
Optional: Test timeout per task (default: 60 seconds)
Optional: Output format for conformance report (json, markdown, junit)

Steps

Step 1: Fetch and Validate Agent Cards

1.1. Retrieve the Agent Card from the well-known endpoint:

curl -s https://agent.example.com/.well-known/agent.json -o agent-card.json

1.2. Validate required top-level fields:

const requiredFields = ["name", "description", "url", "skills"];
for (const field of requiredFields) {
  assert(agentCard[field] !== undefined, `Missing required field: ${field}`);
}

1.3. Validate each skill entry:

for (const skill of agentCard.skills) {
  assert(skill.id, "Skill missing id");
  assert(skill.name, "Skill missing name");
  assert(skill.description, "Skill missing description");
  assert(
    Array.isArray(skill.inputModes) && skill.inputModes.length > 0,
    `Skill ${skill.id} missing inputModes`
  );
  assert(
    Array.isArray(skill.outputModes) && skill.outputModes.length > 0,
    `Skill ${skill.id} missing outputModes`
  );
}

1.4. Validate authentication configuration:

If authentication.schemes includes oauth2, verify credentials.oauth2 has tokenUrl
If authentication.schemes includes apiKey, verify credentials.apiKey has headerName

1.5. Validate capability flags are boolean values.

1.6. Record validation results in the conformance report:

interface ConformanceResult {
  test: string;
  category: "agent-card" | "lifecycle" | "streaming" | "error-handling" | "interop";
  status: "pass" | "fail" | "skip";
  message?: string;
  duration_ms?: number;
}

Got: Agent Card passes all structural validation checks.

If fail: Record each validation failure with specific field and reason. Never abort; continue testing other aspects. Invalid Agent Card itself a test result.

Step 2: Send Test Tasks Covering All Lifecycle States

2.1. Test: Task submission (submitted -> working -> completed)

Send a task that the agent should be able to handle based on its declared skills:

const submitResult = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 1,
  method: "tasks/send",
  params: {
    id: `test-${uuid()}`,
    sessionId: `session-${uuid()}`,
    message: {
      role: "user",
      parts: [{ type: "text", text: skillExamples[0] }],
    },
  },
});

assert(submitResult.result, "tasks/send should return a result");
assert(submitResult.result.id, "Result should include task ID");
assert(
  ["submitted", "working", "completed"].includes(submitResult.result.status.state),
  `Unexpected initial state: ${submitResult.result.status.state}`
);

2.2. Test: Task polling (tasks/get)

Poll until the task reaches a terminal state:

let task = submitResult.result;
const startTime = Date.now();
while (!["completed", "failed", "canceled"].includes(task.status.state)) {
  if (Date.now() - startTime > TEST_TIMEOUT_MS) {
    fail(`Task ${task.id} did not complete within ${TEST_TIMEOUT_MS}ms`);
    break;
  }
  await sleep(1000);
  const getResult = await sendJsonRpc(agentUrl, {
    jsonrpc: "2.0",
    id: 2,
    method: "tasks/get",
    params: { id: task.id },
  });
  task = getResult.result;
}

assert(task.status.state === "completed", `Task should complete, got: ${task.status.state}`);

2.3. Test: Task cancellation

Submit a task and immediately cancel it:

const cancelTask = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 3,
  method: "tasks/send",
  params: { id: `test-cancel-${uuid()}`, sessionId: `session-${uuid()}`, message: { ... } },
});

const cancelResult = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 4,
  method: "tasks/cancel",
  params: { id: cancelTask.result.id },
});

assert(
  cancelResult.result.status.state === "canceled",
  "Canceled task should be in canceled state"
);

2.4. Test: Input-required state (multi-turn)

If any skill supports multi-turn interaction, send an ambiguous request that should trigger input-required, then provide the follow-up:

// Send ambiguous request
const multiTurnTask = await sendJsonRpc(agentUrl, { ... });

// Poll until input-required or completed
// If input-required, send follow-up
if (task.status.state === "input-required") {
  const followUp = await sendJsonRpc(agentUrl, {
    jsonrpc: "2.0",
    id: 6,
    method: "tasks/send",
    params: {
      id: task.id,
      sessionId: task.sessionId,
      message: { role: "user", parts: [{ type: "text", text: "Column A and Column B" }] },
    },
  });
  assert(
    ["working", "completed"].includes(followUp.result.status.state),
    "Follow-up should resume task"
  );
}

2.5. Test: State transition history

If the Agent Card declares stateTransitionHistory: true:

const getWithHistory = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 7,
  method: "tasks/get",
  params: { id: completedTaskId, historyLength: 100 },
});

assert(
  Array.isArray(getWithHistory.result.history),
  "Task should include history array"
);
assert(
  getWithHistory.result.history.length >= 2,
  "History should have at least 2 entries (submitted and completed)"
);

Got: All lifecycle state transitions work correct. Tasks complete success, cancel clean, multi-turn interaction functions when supported.

If fail: Record specific state transition that failed, expected state, actual state. Include full JSON-RPC response in report for debugging.

Step 3: Validate SSE Streaming Responses

3.1. Skip this step if the Agent Card declares streaming: false.

3.2. Send a tasks/sendSubscribe request and validate the SSE stream:

const response = await fetch(`${agentUrl}/subscribe`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    jsonrpc: "2.0",
    id: 10,
    method: "tasks/sendSubscribe",
    params: {
      id: `test-stream-${uuid()}`,
      sessionId: `session-${uuid()}`,
      message: { role: "user", parts: [{ type: "text", text: "Stream test task" }] },
    },
  }),
});

assert(
  response.headers.get("content-type")?.includes("text/event-stream"),
  "Response must be text/event-stream"
);

3.3. Parse SSE events and validate structure:

const events: SSEEvent[] = [];
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buffer += decoder.decode(value, { stream: true });

  // Parse SSE events from buffer
  const lines = buffer.split("\n");
  for (const line of lines) {
    if (line.startsWith("event: ")) {
      currentEvent.type = line.slice(7);
    } else if (line.startsWith("data: ")) {
      currentEvent.data = JSON.parse(line.slice(6));
      events.push(currentEvent);
    }
  }
}

3.4. Validate the event sequence:

First event should be a status event with state submitted or working
Intermediate events may include status updates and artifact deliveries
Final event should have final: true with a terminal state
No events should arrive after the final event

3.5. Validate that SSE connection cleanup works:

Close the connection mid-stream
Verify the task can still be retrieved via tasks/get
Verify no server errors from the premature disconnect

Got: SSE stream delivers correct formatted events in right sequence, ending with final terminal event.

If fail: SSE advertised but endpoint returns non-SSE response? Record as conformance failure. Events arrive out of order? Record sequence. Stream never terminates? Record timeout.

Step 4: Test Error Handling and Edge Cases

4.1. Test: Unknown method

const unknownMethod = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 20,
  method: "tasks/nonexistent",
  params: {},
});
assert(unknownMethod.error?.code === -32601, "Should return method not found");

4.2. Test: Malformed JSON-RPC request

const malformed = await fetch(agentUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: '{"not": "valid jsonrpc"}',
});
const response = await malformed.json();
assert(response.error?.code === -32600, "Should return invalid request");

4.3. Test: Get nonexistent task

const notFound = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 22,
  method: "tasks/get",
  params: { id: "nonexistent-task-id" },
});
assert(notFound.error, "Should return error for nonexistent task");

4.4. Test: Cancel already completed task

const cancelCompleted = await sendJsonRpc(agentUrl, {
  jsonrpc: "2.0",
  id: 23,
  method: "tasks/cancel",
  params: { id: completedTaskId },
});
assert(cancelCompleted.error, "Should error when canceling completed task");

4.5. Test: Authentication enforcement

If authentication is configured, send a request without credentials:

const unauthResponse = await fetch(agentUrl, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ jsonrpc: "2.0", id: 24, method: "tasks/get", params: { id: "x" } }),
});
assert(unauthResponse.status === 401, "Should reject unauthenticated requests");

4.6. Test: Agent Card is publicly accessible without auth

const publicCard = await fetch(`${agentUrl}/.well-known/agent.json`);
assert(publicCard.status === 200, "Agent Card should be publicly accessible");

Got: All error conditions return appropriate JSON-RPC error codes without crashing server.

If fail: Record each error handling test that fails. Server crashes during error testing critical failures must be fixed before deployment.

Step 5: Generate Interoperability Conformance Report

5.1. Aggregate all test results into a structured report:

interface ConformanceReport {
  agentUrl: string;
  agentName: string;
  agentVersion: string;
  testDate: string;
  summary: {
    total: number;
    passed: number;
    failed: number;
    skipped: number;
  };
  categories: {
    agentCard: ConformanceResult[];
    lifecycle: ConformanceResult[];
    streaming: ConformanceResult[];
    errorHandling: ConformanceResult[];
    interop: ConformanceResult[];
  };
  conformanceLevel: "full" | "partial" | "minimal" | "non-conformant";
}

5.2. Calculate the conformance level:

full: All tests pass, including streaming and push notifications
partial: Core lifecycle tests pass, some optional features fail
minimal: Agent Card valid and basic task send/get works
non-conformant: Agent Card invalid or basic lifecycle broken

5.3. Generate the report in the requested format:

json: Machine-readable for CI/CD integration
markdown: Human-readable with pass/fail tables
junit: XML format for test framework integration

5.4. Include recommendations for fixing failures:

## Failed Tests

| Test | Category | Message | Recommendation |
|------|----------|---------|----------------|
| cancel-completed-task | error-handling | Server returned 500 | Add guard for terminal state transitions |
| sse-final-event | streaming | No final event received | Ensure SSE sends event with final:true |

5.5. If bidirectional testing was requested (two agents), validate:

Agent A can discover Agent B's Agent Card
Agent A can send a task to Agent B
Agent B can send a task to Agent A
Both agents handle concurrent tasks without interference

Got: Complete conformance report with pass/fail results, conformance level, actionable recommendations.

If fail: Report generation itself fails? Output raw test results to stdout as fallback. Test data should never be lost due to reporting error.

Checks

Agent Card fetched and structural validated
At least one task completes full lifecycle (submitted -> working -> completed)
Task cancellation works correct
Error responses use correct JSON-RPC error codes
SSE streaming tested if advertised in capabilities
Authentication enforced on task endpoints but not on Agent Card
Conformance report generated in requested format
Failed tests include actionable remediation guidance
Test suite can run in CI/CD without manual intervention

Pitfalls

Test against cold server: Some agents take time to initialize. Add health check or warmup request before running tests.
Hardcoded test data: Use dynamic task and session IDs (UUIDs) to avoid collisions when running tests repeated. Never assume specific task ID available.
Ignore timing: Task transitions asynchronous. Always poll with backoff rather than asserting immediate state changes.
SSE parsing complexity: SSE events may span multiple chunks. Buffer incoming data, parse complete events, not raw chunks.
Test only happy path: Error handling tests as important as success tests. Malformed requests, invalid transitions, auth failures must all be covered.
Network dependency: Tests should be runnable against localhost for development and remote URLs for production. Parameterize agent URL.
Assume skill behavior: Test suite validates protocol conformance, not skill correctness. Use example phrases from Agent Card to trigger skills, never assert specific output content.

GitHub Repository

pjt222/agent-almanac

Path: i18n/caveman/skills/test-a2a-interop

agentsagentskillsai-assisted-developmentclaude-codeskillsteams

FAQ

Frequently asked questions

What is the test-a2a-interop skill?

test-a2a-interop is a Claude Skill by pjt222. Skills package instructions and resources that Claude loads on demand, so Claude can perform test-a2a-interop-related tasks without extra prompting.

How do I install test-a2a-interop?

Use the install commands on this page: add test-a2a-interop to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does test-a2a-interop belong to?

test-a2a-interop is in the Testing category, tagged ai, testing, automation and design.

Is test-a2a-interop free to use?

Yes. test-a2a-interop is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

cloudflare-cron-triggers

Testing

This skill provides comprehensive knowledge for implementing Cloudflare Cron Triggers to schedule Workers using cron expressions. It covers setting up periodic tasks, maintenance jobs, and automated workflows while handling common issues like invalid cron expressions and timezone problems. Developers can use it for configuring scheduled handlers, testing cron triggers, and integrating with Workflows and Green Compute.

View skill

webapp-testing

Testing

This Claude Skill provides a Playwright-based toolkit for testing local web applications through Python scripts. It enables frontend verification, UI debugging, screenshot capture, and log viewing while managing server lifecycles. Use it for browser automation tasks but run scripts directly rather than reading their source code to avoid context pollution.

View skill

finishing-a-development-branch

Testing

This skill helps developers complete finished work by verifying tests pass and then presenting structured integration options. It guides the workflow for merging, creating PRs, or cleaning up branches after implementation is done. Use it when your code is ready and tested to systematically finalize the development process.

View skill

test-a2a-interop

About

Quick Install

Claude Code

Documentation

Test A2A Interoperability

When Use

Inputs

Steps

Step 1: Fetch and Validate Agent Cards

Step 2: Send Test Tasks Covering All Lifecycle States

Step 3: Validate SSE Streaming Responses

Step 4: Test Error Handling and Edge Cases

Step 5: Generate Interoperability Conformance Report

Checks

Pitfalls

See Also

GitHub Repository

Frequently asked questions

What is the test-a2a-interop skill?

How do I install test-a2a-interop?

What category does test-a2a-interop belong to?

Is test-a2a-interop free to use?

Related Skills