← Back to Skills

vertex-engine-inspector

jeremylongshore
Updated Today
35 views
712
74
712
View on GitHub
Metaai

About

This skill inspects and validates Vertex AI Agent Engine deployments, checking components like the Code Execution Sandbox and Memory Bank for A2A protocol compliance and security. It generates production readiness scores to assess deployment health. Developers should use it when prompted to inspect, validate, or check the configuration of an Agent Engine deployment.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/jeremylongshore/claude-code-plugins-plus
Git CloneAlternative
git clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/vertex-engine-inspector

Copy and paste this command in Claude Code to install this skill

Documentation

What This Skill Does

Expert inspector for the Vertex AI Agent Engine managed runtime. Performs comprehensive validation of deployed agents including runtime configuration, security posture, performance settings, A2A protocol compliance, and production readiness scoring.

When This Skill Activates

Trigger Phrases

  • "Inspect Vertex AI Engine agent"
  • "Validate Agent Engine deployment"
  • "Check Code Execution Sandbox configuration"
  • "Verify Memory Bank settings"
  • "Monitor agent health"
  • "Agent Engine production readiness"
  • "A2A protocol compliance check"
  • "Agent Engine security audit"

Use Cases

  • Pre-production deployment validation
  • Post-deployment health monitoring
  • Security compliance audits
  • Performance optimization reviews
  • Troubleshooting agent issues
  • Configuration drift detection

Inspection Categories

1. Runtime Configuration βœ…

  • Model selection (Gemini 2.5 Pro/Flash)
  • Tools enabled (Code Execution, Memory Bank, custom)
  • VPC configuration
  • Resource allocation
  • Scaling policies

2. Code Execution Sandbox πŸ”’

  • Security: Isolated environment, no external network access
  • State Persistence: TTL validation (1-14 days)
  • IAM: Least privilege permissions
  • Performance: Timeout and resource limits
  • Concurrent Executions: Max concurrent code runs

Critical Checks:

βœ… State TTL between 7-14 days (optimal for production)
βœ… Sandbox type is SECURE_ISOLATED
βœ… IAM permissions limited to required GCP services only
βœ… Timeout configured appropriately
⚠️ State TTL < 7 days may cause premature session loss
❌ State TTL > 14 days not allowed by Agent Engine

3. Memory Bank Configuration 🧠

  • Enabled Status: Persistent memory active
  • Retention Policy: Max memories, retention days
  • Storage Backend: Firestore encryption & region
  • Query Performance: Indexing, caching, latency
  • Auto-Cleanup: Quota management

Critical Checks:

βœ… Max memories >= 100 (prevents conversation truncation)
βœ… Indexing enabled (fast query performance)
βœ… Auto-cleanup enabled (prevents quota exhaustion)
βœ… Encrypted at rest (Firestore default)
⚠️ Low memory limit may truncate long conversations

4. A2A Protocol Compliance πŸ”—

  • AgentCard: Available at /.well-known/agent-card
  • Task API: POST /v1/tasks:send responds correctly
  • Status API: GET /v1/tasks/{task_id} accessible
  • Protocol Version: 1.0 compliance
  • Required Fields: name, description, tools, version

Compliance Report:

βœ… AgentCard accessible and valid
βœ… Task submission API functional
βœ… Status polling API functional
βœ… Protocol version 1.0
❌ Missing AgentCard fields: [...]
❌ Task API not responding (check IAM/networking)

5. Security Posture πŸ›‘οΈ

  • IAM Roles: Least privilege validation
  • VPC Service Controls: Perimeter protection
  • Model Armor: Prompt injection protection
  • Encryption: At-rest and in-transit
  • Service Account: Proper configuration
  • Secret Management: No hardcoded credentials

Security Score:

🟒 SECURE (90-100%): Production ready
🟑 NEEDS ATTENTION (70-89%): Address issues before prod
πŸ”΄ INSECURE (<70%): Do not deploy to production

6. Performance Metrics πŸ“Š

  • Auto-Scaling: Min/max instances configured
  • Resource Limits: CPU, memory appropriate
  • Latency: P50, P95, P99 within SLOs
  • Throughput: Requests per second
  • Token Usage: Cost tracking
  • Error Rate: < 5% target

Health Status:

🟒 HEALTHY: Error rate < 5%, latency < 3s (p95)
🟑 DEGRADED: Error rate 5-10% or latency 3-5s
πŸ”΄ UNHEALTHY: Error rate > 10% or latency > 5s

7. Monitoring & Observability πŸ“ˆ

  • Cloud Monitoring: Dashboards configured
  • Alerting: Policies for errors, latency, costs
  • Logging: Structured logs aggregated
  • Tracing: OpenTelemetry enabled
  • Error Tracking: Cloud Error Reporting

Observability Score:

βœ… All 5 pillars configured: Metrics, Logs, Traces, Alerts, Dashboards
⚠️ Missing alerts for critical scenarios
❌ No monitoring configured (production blocker)

Production Readiness Scoring

Scoring Matrix

CategoryWeightChecks
Security30%6 checks (IAM, VPC-SC, encryption, etc.)
Performance25%6 checks (scaling, limits, SLOs, etc.)
Monitoring20%6 checks (dashboards, alerts, logs, etc.)
Compliance15%5 checks (audit logs, DR, privacy, etc.)
Reliability10%5 checks (multi-region, failover, etc.)

Overall Readiness Status

🟒 PRODUCTION READY (85-100%)
   - All critical checks passed
   - Minor optimizations recommended
   - Safe to deploy

🟑 NEEDS IMPROVEMENT (70-84%)
   - Some important checks failed
   - Address issues before production
   - Staging deployment acceptable

πŸ”΄ NOT READY (<70%)
   - Critical failures present
   - Do not deploy to production
   - Fix blocking issues first

Inspection Workflow

Phase 1: Configuration Analysis

1. Connect to Agent Engine
2. Retrieve agent metadata
3. Parse runtime configuration
4. Extract Code Execution settings
5. Extract Memory Bank settings
6. Document VPC configuration

Phase 2: Protocol Validation

1. Test AgentCard endpoint
2. Validate AgentCard structure
3. Test Task API (POST /v1/tasks:send)
4. Test Status API (GET /v1/tasks/{id})
5. Verify A2A protocol version

Phase 3: Security Audit

1. Review IAM roles and permissions
2. Check VPC Service Controls
3. Validate encryption settings
4. Scan for hardcoded secrets
5. Verify Model Armor enabled
6. Assess service account security

Phase 4: Performance Analysis

1. Query Cloud Monitoring metrics
2. Calculate error rate (last 24h)
3. Analyze latency percentiles
4. Review token usage and costs
5. Check auto-scaling behavior
6. Validate resource limits

Phase 5: Production Readiness

1. Run all checklist items (28 checks)
2. Calculate category scores
3. Calculate overall score
4. Determine readiness status
5. Generate recommendations
6. Create action plan

Tool Permissions

Read-only inspection - Cannot modify configurations:

  • Read: Analyze agent configuration files
  • Grep: Search for security issues
  • Glob: Find related configuration
  • Bash: Query GCP APIs (read-only)

Example Inspection Report

Agent ID: gcp-deployer-agent
Deployment Status: RUNNING
Inspection Date: 2025-12-09

Runtime Configuration:
  Model: gemini-2.5-flash
  Code Execution: βœ… Enabled (TTL: 14 days)
  Memory Bank: βœ… Enabled (retention: 90 days)
  VPC: βœ… Configured (private-vpc-prod)

A2A Protocol Compliance:
  AgentCard: βœ… Valid
  Task API: βœ… Functional
  Status API: βœ… Functional
  Protocol Version: 1.0

Security Posture:
  IAM: βœ… Least privilege (score: 95%)
  VPC-SC: βœ… Enabled
  Model Armor: βœ… Enabled
  Encryption: βœ… At-rest & in-transit
  Overall: 🟒 SECURE (92%)

Performance Metrics (24h):
  Request Count: 12,450
  Error Rate: 2.3% 🟒
  Latency (p95): 1,850ms 🟒
  Token Usage: 450K tokens
  Cost Estimate: $12.50/day

Production Readiness:
  Security: 92% (28/30 points)
  Performance: 88% (22/25 points)
  Monitoring: 95% (19/20 points)
  Compliance: 80% (12/15 points)
  Reliability: 70% (7/10 points)

  Overall Score: 87% 🟒 PRODUCTION READY

Recommendations:
  1. Enable multi-region deployment (reliability +10%)
  2. Configure automated backups (compliance +5%)
  3. Add circuit breaker pattern (reliability +5%)
  4. Optimize memory bank indexing (performance +3%)

Integration with Other Plugins

Works with jeremy-adk-orchestrator

  • Orchestrator deploys agents
  • Inspector validates deployments
  • Feedback loop for optimization

Works with jeremy-vertex-validator

  • Validator checks code before deployment
  • Inspector validates runtime after deployment
  • Complementary pre/post checks

Works with jeremy-adk-terraform

  • Terraform provisions infrastructure
  • Inspector validates provisioned agents
  • Ensures IaC matches runtime

Troubleshooting Guide

Issue: Agent not responding

Inspector checks:

  • VPC configuration allows traffic
  • IAM permissions correct
  • Agent Engine status is RUNNING
  • No quota limits exceeded

Issue: High error rate

Inspector checks:

  • Model configuration appropriate
  • Resource limits not exceeded
  • Code Execution sandbox not timing out
  • Memory Bank not quota-exhausted

Issue: Slow response times

Inspector checks:

  • Auto-scaling configured
  • Code Execution TTL appropriate
  • Memory Bank indexing enabled
  • Caching strategy implemented

Version History

  • 1.0.0 (2025): Initial release with Agent Engine GA support, Code Execution Sandbox, Memory Bank, A2A protocol validation

References

GitHub Repository

jeremylongshore/claude-code-plugins-plus
Path: plugins/ai-ml/jeremy-vertex-engine/skills/vertex-engine-inspector
aiautomationclaude-codedevopsmarketplacemcp

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill