vertex-engine-inspector

jeremylongshore

Updated Today

35 views

712

Metaai

About

This skill inspects and validates Vertex AI Agent Engine deployments, checking components like the Code Execution Sandbox and Memory Bank for A2A protocol compliance and security. It generates production readiness scores to assess deployment health. Developers should use it when prompted to inspect, validate, or check the configuration of an Agent Engine deployment.

Quick Install

Claude Code

Recommended

Plugin CommandRecommended

/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus

Git CloneAlternative

git clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/vertex-engine-inspector

Copy and paste this command in Claude Code to install this skill

Documentation

What This Skill Does

Expert inspector for the Vertex AI Agent Engine managed runtime. Performs comprehensive validation of deployed agents including runtime configuration, security posture, performance settings, A2A protocol compliance, and production readiness scoring.

When This Skill Activates

Trigger Phrases

"Inspect Vertex AI Engine agent"
"Validate Agent Engine deployment"
"Check Code Execution Sandbox configuration"
"Verify Memory Bank settings"
"Monitor agent health"
"Agent Engine production readiness"
"A2A protocol compliance check"
"Agent Engine security audit"

Use Cases

Pre-production deployment validation
Post-deployment health monitoring
Security compliance audits
Performance optimization reviews
Troubleshooting agent issues
Configuration drift detection

Inspection Categories

1. Runtime Configuration ✅

Model selection (Gemini 2.5 Pro/Flash)
Tools enabled (Code Execution, Memory Bank, custom)
VPC configuration
Resource allocation
Scaling policies

2. Code Execution Sandbox 🔒

Security: Isolated environment, no external network access
State Persistence: TTL validation (1-14 days)
IAM: Least privilege permissions
Performance: Timeout and resource limits
Concurrent Executions: Max concurrent code runs

Critical Checks:

✅ State TTL between 7-14 days (optimal for production)
✅ Sandbox type is SECURE_ISOLATED
✅ IAM permissions limited to required GCP services only
✅ Timeout configured appropriately
⚠️ State TTL < 7 days may cause premature session loss
❌ State TTL > 14 days not allowed by Agent Engine

3. Memory Bank Configuration 🧠

Enabled Status: Persistent memory active
Retention Policy: Max memories, retention days
Storage Backend: Firestore encryption & region
Query Performance: Indexing, caching, latency
Auto-Cleanup: Quota management

Critical Checks:

✅ Max memories >= 100 (prevents conversation truncation)
✅ Indexing enabled (fast query performance)
✅ Auto-cleanup enabled (prevents quota exhaustion)
✅ Encrypted at rest (Firestore default)
⚠️ Low memory limit may truncate long conversations

4. A2A Protocol Compliance 🔗

AgentCard: Available at /.well-known/agent-card
Task API: POST /v1/tasks:send responds correctly
Status API: GET /v1/tasks/{task_id} accessible
Protocol Version: 1.0 compliance
Required Fields: name, description, tools, version

Compliance Report:

✅ AgentCard accessible and valid
✅ Task submission API functional
✅ Status polling API functional
✅ Protocol version 1.0
❌ Missing AgentCard fields: [...]
❌ Task API not responding (check IAM/networking)

5. Security Posture 🛡️

IAM Roles: Least privilege validation
VPC Service Controls: Perimeter protection
Model Armor: Prompt injection protection
Encryption: At-rest and in-transit
Service Account: Proper configuration
Secret Management: No hardcoded credentials

Security Score:

🟢 SECURE (90-100%): Production ready
🟡 NEEDS ATTENTION (70-89%): Address issues before prod
🔴 INSECURE (<70%): Do not deploy to production

6. Performance Metrics 📊

Auto-Scaling: Min/max instances configured
Resource Limits: CPU, memory appropriate
Latency: P50, P95, P99 within SLOs
Throughput: Requests per second
Token Usage: Cost tracking
Error Rate: < 5% target

Health Status:

🟢 HEALTHY: Error rate < 5%, latency < 3s (p95)
🟡 DEGRADED: Error rate 5-10% or latency 3-5s
🔴 UNHEALTHY: Error rate > 10% or latency > 5s

7. Monitoring & Observability 📈

Cloud Monitoring: Dashboards configured
Alerting: Policies for errors, latency, costs
Logging: Structured logs aggregated
Tracing: OpenTelemetry enabled
Error Tracking: Cloud Error Reporting

Observability Score:

✅ All 5 pillars configured: Metrics, Logs, Traces, Alerts, Dashboards
⚠️ Missing alerts for critical scenarios
❌ No monitoring configured (production blocker)

Production Readiness Scoring

Scoring Matrix

Category	Weight	Checks
Security	30%	6 checks (IAM, VPC-SC, encryption, etc.)
Performance	25%	6 checks (scaling, limits, SLOs, etc.)
Monitoring	20%	6 checks (dashboards, alerts, logs, etc.)
Compliance	15%	5 checks (audit logs, DR, privacy, etc.)
Reliability	10%	5 checks (multi-region, failover, etc.)

Overall Readiness Status

🟢 PRODUCTION READY (85-100%)
   - All critical checks passed
   - Minor optimizations recommended
   - Safe to deploy

🟡 NEEDS IMPROVEMENT (70-84%)
   - Some important checks failed
   - Address issues before production
   - Staging deployment acceptable

🔴 NOT READY (<70%)
   - Critical failures present
   - Do not deploy to production
   - Fix blocking issues first

Inspection Workflow

Phase 1: Configuration Analysis

1. Connect to Agent Engine
2. Retrieve agent metadata
3. Parse runtime configuration
4. Extract Code Execution settings
5. Extract Memory Bank settings
6. Document VPC configuration

Phase 2: Protocol Validation

1. Test AgentCard endpoint
2. Validate AgentCard structure
3. Test Task API (POST /v1/tasks:send)
4. Test Status API (GET /v1/tasks/{id})
5. Verify A2A protocol version

Phase 3: Security Audit

1. Review IAM roles and permissions
2. Check VPC Service Controls
3. Validate encryption settings
4. Scan for hardcoded secrets
5. Verify Model Armor enabled
6. Assess service account security

Phase 4: Performance Analysis

1. Query Cloud Monitoring metrics
2. Calculate error rate (last 24h)
3. Analyze latency percentiles
4. Review token usage and costs
5. Check auto-scaling behavior
6. Validate resource limits

Phase 5: Production Readiness

1. Run all checklist items (28 checks)
2. Calculate category scores
3. Calculate overall score
4. Determine readiness status
5. Generate recommendations
6. Create action plan

Tool Permissions

Read-only inspection - Cannot modify configurations:

Read: Analyze agent configuration files
Grep: Search for security issues
Glob: Find related configuration
Bash: Query GCP APIs (read-only)

Example Inspection Report

Agent ID: gcp-deployer-agent
Deployment Status: RUNNING
Inspection Date: 2025-12-09

Runtime Configuration:
  Model: gemini-2.5-flash
  Code Execution: ✅ Enabled (TTL: 14 days)
  Memory Bank: ✅ Enabled (retention: 90 days)
  VPC: ✅ Configured (private-vpc-prod)

A2A Protocol Compliance:
  AgentCard: ✅ Valid
  Task API: ✅ Functional
  Status API: ✅ Functional
  Protocol Version: 1.0

Security Posture:
  IAM: ✅ Least privilege (score: 95%)
  VPC-SC: ✅ Enabled
  Model Armor: ✅ Enabled
  Encryption: ✅ At-rest & in-transit
  Overall: 🟢 SECURE (92%)

Performance Metrics (24h):
  Request Count: 12,450
  Error Rate: 2.3% 🟢
  Latency (p95): 1,850ms 🟢
  Token Usage: 450K tokens
  Cost Estimate: $12.50/day

Production Readiness:
  Security: 92% (28/30 points)
  Performance: 88% (22/25 points)
  Monitoring: 95% (19/20 points)
  Compliance: 80% (12/15 points)
  Reliability: 70% (7/10 points)

  Overall Score: 87% 🟢 PRODUCTION READY

Recommendations:
  1. Enable multi-region deployment (reliability +10%)
  2. Configure automated backups (compliance +5%)
  3. Add circuit breaker pattern (reliability +5%)
  4. Optimize memory bank indexing (performance +3%)

Integration with Other Plugins

Works with jeremy-adk-orchestrator

Orchestrator deploys agents
Inspector validates deployments
Feedback loop for optimization

Works with jeremy-vertex-validator

Validator checks code before deployment
Inspector validates runtime after deployment
Complementary pre/post checks

Works with jeremy-adk-terraform

Terraform provisions infrastructure
Inspector validates provisioned agents
Ensures IaC matches runtime

Troubleshooting Guide

Issue: Agent not responding

Inspector checks:

VPC configuration allows traffic
IAM permissions correct
Agent Engine status is RUNNING
No quota limits exceeded

Issue: High error rate

Inspector checks:

Model configuration appropriate
Resource limits not exceeded
Code Execution sandbox not timing out
Memory Bank not quota-exhausted

Issue: Slow response times

Inspector checks:

Auto-scaling configured
Code Execution TTL appropriate
Memory Bank indexing enabled
Caching strategy implemented

Version History

1.0.0 (2025): Initial release with Agent Engine GA support, Code Execution Sandbox, Memory Bank, A2A protocol validation

References

Agent Engine: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview
Code Execution: https://cloud.google.com/agent-builder/agent-engine/code-execution/overview
Memory Bank: https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/overview
A2A Protocol: https://google.github.io/adk-docs/a2a/

GitHub Repository

jeremylongshore/claude-code-plugins-plus

Path: plugins/ai-ml/jeremy-vertex-engine/skills/vertex-engine-inspector

aiautomationclaude-codedevopsmarketplacemcp

Related Skills

sglang

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill

vertex-engine-inspector

About

Quick Install

Claude Code

Documentation

What This Skill Does

When This Skill Activates

Trigger Phrases

Use Cases

Inspection Categories

1. Runtime Configuration ✅

2. Code Execution Sandbox 🔒

3. Memory Bank Configuration 🧠

4. A2A Protocol Compliance 🔗

5. Security Posture 🛡️

6. Performance Metrics 📊

7. Monitoring & Observability 📈

Production Readiness Scoring

Scoring Matrix

Overall Readiness Status

Inspection Workflow

Phase 1: Configuration Analysis

Phase 2: Protocol Validation

Phase 3: Security Audit

Phase 4: Performance Analysis

Phase 5: Production Readiness

Tool Permissions

Example Inspection Report

Integration with Other Plugins

Works with jeremy-adk-orchestrator

Works with jeremy-vertex-validator

Works with jeremy-adk-terraform

Troubleshooting Guide

Issue: Agent not responding

Issue: High error rate

Issue: Slow response times

Version History

References

GitHub Repository

Related Skills

sglang

evaluating-llms-harness

llamaguard

langchain