vertex-agent-builder
关于
This skill provides production-ready scaffolding for building and deploying generative AI agents on Google Cloud's Vertex AI platform. It enables developers to quickly implement agents with RAG, function calling, and multi-modal capabilities using Gemini models. Use it when you need to deploy scalable, enterprise-ready AI agents with Google Cloud infrastructure.
快速安装
Claude Code
推荐/plugin add https://github.com/jeremylongshore/claude-code-plugins-plusgit clone https://github.com/jeremylongshore/claude-code-plugins-plus.git ~/.claude/skills/vertex-agent-builder在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Vertex AI Agent Builder Skill
Overview
This skill provides production-ready scaffolding and deployment for generative AI agents using Google Cloud's Vertex AI platform. Built on actual Google Cloud source code from GoogleCloudPlatform/generative-ai and agent-starter-pack repositories, it offers:
- Gemini Model Integration (1.5 Pro, 1.5 Flash, experimental models)
- Multi-modal Capabilities (text, image, video, audio processing)
- RAG Implementation (Retrieval Augmented Generation with Vector Search)
- Function Calling (Tool use with Gemini models)
- Production Deployment (Cloud Run, Vertex AI Endpoints)
- Evaluation Framework (AutoSxS, ROUGE, custom metrics)
- Observability (Cloud Logging, Monitoring, Trace)
Installation
# Install Google Cloud SDK
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
gcloud init
# Install Python SDK
pip install google-cloud-aiplatform>=1.38.0
pip install vertexai>=1.46.0
pip install google-generativeai>=0.3.2
# Clone source repositories
git clone https://github.com/GoogleCloudPlatform/generative-ai.git
git clone https://github.com/GoogleCloudPlatform/agent-starter-pack.git
git clone https://github.com/GoogleCloudPlatform/vertex-ai-samples.git
# Install this plugin
/plugin install jeremy-vertex-ai@jeremylongshore
Quick Start (5 Minutes)
Create Your First Vertex AI Agent
import vertexai
from vertexai.generative_models import GenerativeModel, ChatSession
from vertexai.preview.generative_models import grounding
# Initialize Vertex AI
vertexai.init(project="your-project-id", location="us-central1")
# Create Gemini model
model = GenerativeModel(
"gemini-1.5-pro-002",
system_instruction="""You are a helpful AI assistant that can:
- Search the web for information
- Analyze documents and images
- Execute Python code
- Call external APIs
"""
)
# Start chat session
chat = model.start_chat()
# Send message with grounding
response = chat.send_message(
"What are the latest developments in quantum computing?",
generation_config={
"temperature": 0.3,
"max_output_tokens": 2048,
"top_p": 0.95,
},
safety_settings={
"HARM_CATEGORY_HATE_SPEECH": "BLOCK_MEDIUM_AND_ABOVE",
"HARM_CATEGORY_DANGEROUS_CONTENT": "BLOCK_MEDIUM_AND_ABOVE",
},
tools=[grounding.Tool(google_search_retrieval=grounding.GoogleSearchRetrieval())]
)
print(response.text)
Agent Patterns
1. Production Agent with Agent Builder
Based on GoogleCloudPlatform/agent-starter-pack:
from vertexai.preview import agents
from vertexai.preview.generative_models import Tool, FunctionDeclaration
import functions_framework
class ProductionAgent:
"""
Production-ready agent using Vertex AI Agent Builder
"""
def __init__(self, project_id: str, location: str = "us-central1"):
self.project_id = project_id
self.location = location
# Initialize agent
self.agent = agents.Agent.create(
project=project_id,
location=location,
display_name="production-agent",
description="Production-ready agent with tools and RAG",
model="gemini-1.5-pro-002",
tools=self._create_tools(),
system_instruction=self._get_system_instruction()
)
def _create_tools(self):
"""Create function calling tools"""
return [
Tool(function_declarations=[
FunctionDeclaration(
name="search_knowledge_base",
description="Search internal knowledge base",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"},
"top_k": {"type": "integer", "default": 5}
}
}
),
FunctionDeclaration(
name="execute_sql",
description="Execute SQL query on BigQuery",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"},
"dataset": {"type": "string"}
}
}
),
FunctionDeclaration(
name="send_email",
description="Send email notification",
parameters={
"type": "object",
"properties": {
"to": {"type": "string"},
"subject": {"type": "string"},
"body": {"type": "string"}
}
}
)
])
]
def _get_system_instruction(self):
"""Get system instruction for agent"""
return """You are a production AI agent that helps users with:
1. Information retrieval from knowledge bases
2. Data analysis using BigQuery
3. Automated communications
Always verify user intent before executing critical operations.
Provide clear explanations of what you're doing and why.
"""
async def process_request(self, user_input: str):
"""Process user request"""
session = self.agent.start_session()
response = await session.send_message(user_input)
# Handle function calls
if response.function_calls:
for func_call in response.function_calls:
result = await self._execute_function(
func_call.name,
func_call.args
)
# Send function result back to model
response = await session.send_message(
content=None,
function_responses=[{
"name": func_call.name,
"response": result
}]
)
return response.text
async def _execute_function(self, name: str, args: dict):
"""Execute function call"""
if name == "search_knowledge_base":
return await self._search_kb(args["query"], args.get("top_k", 5))
elif name == "execute_sql":
return await self._execute_bigquery(args["query"], args["dataset"])
elif name == "send_email":
return await self._send_email(args["to"], args["subject"], args["body"])
# Deploy as Cloud Function
@functions_framework.http
def agent_endpoint(request):
"""HTTP Cloud Function for agent"""
agent = ProductionAgent(
project_id="your-project",
location="us-central1"
)
request_json = request.get_json()
response = agent.process_request(request_json["message"])
return {"response": response}
2. RAG-Enhanced Agent
Based on GoogleCloudPlatform/generative-ai/gemini/use-cases/retrieval-augmented-generation:
from vertexai.language_models import TextEmbeddingModel
from vertexai.preview import rag
from google.cloud import aiplatform
class RAGAgent:
"""
Agent with Retrieval Augmented Generation using Vertex AI Search
"""
def __init__(self, project_id: str, corpus_name: str):
self.project_id = project_id
# Create RAG corpus
self.corpus = rag.create_corpus(
display_name=corpus_name,
description="Knowledge base for RAG"
)
# Initialize embedding model
self.embedding_model = TextEmbeddingModel.from_pretrained(
"text-embedding-004"
)
# Initialize Gemini model
self.model = GenerativeModel("gemini-1.5-pro-002")
async def import_documents(self, gcs_uris: list):
"""Import documents into RAG corpus"""
import_job = rag.import_files(
corpus_name=self.corpus.name,
gcs_uris=gcs_uris,
chunk_size=512,
chunk_overlap=100
)
# Wait for import to complete
import_job.wait()
return import_job
async def query_with_rag(self, query: str, similarity_top_k: int = 5):
"""Query with RAG retrieval"""
# Retrieve relevant chunks
retrieval_response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=self.corpus.name,
similarity_top_k=similarity_top_k
)
],
query=query
)
# Build context from retrieved chunks
context = "\n\n".join([
f"Source: {chunk.source}\nContent: {chunk.text}"
for chunk in retrieval_response.contexts
])
# Generate response with context
prompt = f"""Based on the following context, answer the question.
Context:
{context}
Question: {query}
Answer:"""
response = self.model.generate_content(
prompt,
generation_config={
"temperature": 0.2,
"max_output_tokens": 1024,
}
)
return {
"answer": response.text,
"sources": [chunk.source for chunk in retrieval_response.contexts],
"confidence": retrieval_response.attribution_score
}
3. Multi-Modal Agent
Based on GoogleCloudPlatform/generative-ai/gemini/multimodality:
from vertexai.generative_models import GenerativeModel, Part
import vertexai.preview.generative_models as generative_models
class MultiModalAgent:
"""
Agent that processes text, images, video, and audio
"""
def __init__(self):
self.model = GenerativeModel("gemini-1.5-flash-002")
async def analyze_image(self, image_path: str, prompt: str):
"""Analyze image with text prompt"""
image = Part.from_uri(
uri=f"gs://your-bucket/{image_path}",
mime_type="image/jpeg"
)
response = self.model.generate_content([prompt, image])
return response.text
async def analyze_video(self, video_path: str, prompt: str):
"""Analyze video content"""
video = Part.from_uri(
uri=f"gs://your-bucket/{video_path}",
mime_type="video/mp4"
)
response = self.model.generate_content(
[prompt, video],
generation_config={
"temperature": 0.4,
"max_output_tokens": 2048,
}
)
return response.text
async def analyze_document(self, pdf_path: str):
"""Extract and analyze PDF document"""
document = Part.from_uri(
uri=f"gs://your-bucket/{pdf_path}",
mime_type="application/pdf"
)
prompt = """Extract all key information from this document:
1. Main topics covered
2. Key findings or conclusions
3. Important data or statistics
4. Actionable recommendations
"""
response = self.model.generate_content([prompt, document])
return response.text
async def generate_code(self, description: str, language: str = "python"):
"""Generate code from natural language"""
prompt = f"""Generate {language} code for the following requirement:
{description}
Requirements:
- Include proper error handling
- Add comprehensive comments
- Follow best practices for {language}
- Make it production-ready
"""
response = self.model.generate_content(
prompt,
generation_config={
"temperature": 0.2,
"max_output_tokens": 4096,
}
)
return response.text
Deployment Patterns
1. Cloud Run Deployment
# cloudbuild.yaml
steps:
# Build container
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA', '.']
# Push to Container Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA']
# Deploy to Cloud Run
- name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
entrypoint: gcloud
args:
- 'run'
- 'deploy'
- 'vertex-agent'
- '--image'
- 'gcr.io/$PROJECT_ID/vertex-agent:$COMMIT_SHA'
- '--region'
- 'us-central1'
- '--platform'
- 'managed'
- '--allow-unauthenticated'
- '--set-env-vars'
- 'GCP_PROJECT=$PROJECT_ID'
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy application
COPY . .
# Run as non-root user
RUN useradd -m -u 1000 agent && chown -R agent:agent /app
USER agent
EXPOSE 8080
CMD ["gunicorn", "--bind", ":8080", "--workers", "1", "--threads", "8", "--timeout", "0", "main:app"]
2. Vertex AI Endpoint Deployment
from google.cloud import aiplatform
def deploy_to_vertex_endpoint():
"""Deploy model to Vertex AI Endpoint"""
aiplatform.init(project="your-project", location="us-central1")
# Upload model
model = aiplatform.Model.upload(
display_name="vertex-agent-model",
artifact_uri="gs://your-bucket/model",
serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest"
)
# Create endpoint
endpoint = aiplatform.Endpoint.create(
display_name="vertex-agent-endpoint",
)
# Deploy model to endpoint
endpoint.deploy(
model=model,
deployed_model_display_name="vertex-agent-v1",
machine_type="n1-standard-4",
min_replica_count=1,
max_replica_count=5,
accelerator_type="NVIDIA_TESLA_T4",
accelerator_count=1,
)
return endpoint
Evaluation Framework
Based on GoogleCloudPlatform/generative-ai/gemini/evaluation:
from vertexai.preview.evaluation import EvalTask, MetricPromptTemplate
import pandas as pd
class AgentEvaluator:
"""
Evaluate agent performance using Vertex AI evaluation framework
"""
def __init__(self):
self.eval_dataset = self._load_eval_dataset()
def _load_eval_dataset(self):
"""Load evaluation dataset"""
return pd.DataFrame([
{
"prompt": "What is the capital of France?",
"reference": "The capital of France is Paris.",
"context": "France is a country in Western Europe."
},
# Add more test cases
])
async def evaluate_accuracy(self, agent):
"""Evaluate agent accuracy"""
eval_task = EvalTask(
dataset=self.eval_dataset,
metrics=[
"rouge_l_sum",
"bleu",
"exact_match",
MetricPromptTemplate(
metric="custom_factuality",
metric_prompt_template="""Rate the factual accuracy of the response.
Reference: {reference}
Response: {response}
Score (0-1):"""
)
]
)
# Run evaluation
results = []
for _, row in self.eval_dataset.iterrows():
response = await agent.process_request(row["prompt"])
results.append({
"prompt": row["prompt"],
"reference": row["reference"],
"response": response
})
eval_result = eval_task.evaluate(
model=agent.model,
response_column="response"
)
return eval_result.summary_metrics
Monitoring & Observability
from google.cloud import monitoring_v3
from google.cloud import logging
import time
class AgentMonitor:
"""
Monitor agent performance and usage
"""
def __init__(self, project_id: str):
self.project_id = project_id
self.monitoring_client = monitoring_v3.MetricServiceClient()
self.logging_client = logging.Client()
self.logger = self.logging_client.logger("vertex-agent")
def log_request(self, request_id: str, user_input: str, response: str, latency: float):
"""Log agent request"""
self.logger.log_struct({
"request_id": request_id,
"timestamp": time.time(),
"user_input": user_input,
"response": response[:500], # Truncate long responses
"latency_ms": latency * 1000,
"model": "gemini-1.5-pro-002",
"success": True
})
def create_custom_metric(self, metric_name: str, value: float):
"""Create custom metric in Cloud Monitoring"""
project_name = f"projects/{self.project_id}"
series = monitoring_v3.TimeSeries()
series.metric.type = f"custom.googleapis.com/agent/{metric_name}"
now = time.time()
seconds = int(now)
nanos = int((now - seconds) * 10 ** 9)
interval = monitoring_v3.TimeInterval(
{"end_time": {"seconds": seconds, "nanos": nanos}}
)
point = monitoring_v3.Point({
"interval": interval,
"value": {"double_value": value}
})
series.points = [point]
self.monitoring_client.create_time_series(
name=project_name,
time_series=[series]
)
def create_dashboard(self):
"""Create monitoring dashboard"""
# Dashboard configuration
dashboard_config = {
"displayName": "Vertex Agent Dashboard",
"widgets": [
{
"title": "Request Latency",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": 'metric.type="custom.googleapis.com/agent/latency"'
}
}
}]
}
},
{
"title": "Error Rate",
"xyChart": {
"dataSets": [{
"timeSeriesQuery": {
"timeSeriesFilter": {
"filter": 'metric.type="custom.googleapis.com/agent/errors"'
}
}
}]
}
}
]
}
return dashboard_config
Cost Optimization
class CostOptimizedAgent:
"""
Cost-optimized agent configuration
"""
def __init__(self):
# Model selection based on task complexity
self.models = {
"simple": "gemini-1.5-flash-002", # Fastest and cheapest
"standard": "gemini-1.5-pro-002", # Balanced
"complex": "gemini-1.5-pro-002", # Most capable
}
# Caching for repeated queries
self.response_cache = {}
def select_model(self, query: str) -> str:
"""Select appropriate model based on query complexity"""
# Simple heuristic based on query length and keywords
complexity_score = len(query) / 100
if "analyze" in query or "explain" in query:
complexity_score += 0.3
if "code" in query or "implement" in query:
complexity_score += 0.4
if complexity_score < 0.3:
return self.models["simple"]
elif complexity_score < 0.7:
return self.models["standard"]
else:
return self.models["complex"]
async def process_with_caching(self, query: str):
"""Process query with response caching"""
# Check cache first
cache_key = hashlib.md5(query.encode()).hexdigest()
if cache_key in self.response_cache:
return self.response_cache[cache_key]
# Select optimal model
model_name = self.select_model(query)
model = GenerativeModel(model_name)
# Process query
response = model.generate_content(
query,
generation_config={
"temperature": 0.1, # Lower temperature for consistency
"max_output_tokens": 1024, # Limit output size
}
)
# Cache response
self.response_cache[cache_key] = response.text
return response.text
Integration with Google Cloud Services
BigQuery Integration
from google.cloud import bigquery
class BigQueryAgent:
"""Agent with BigQuery data access"""
def __init__(self, project_id: str):
self.bq_client = bigquery.Client(project=project_id)
self.model = GenerativeModel("gemini-1.5-pro-002")
async def nl2sql(self, natural_language_query: str, dataset: str):
"""Convert natural language to SQL"""
# Get table schemas
tables = self.bq_client.list_tables(dataset)
schema_info = []
for table in tables:
table_ref = self.bq_client.get_table(table.reference)
schema_info.append(f"Table: {table.table_id}\nSchema: {table_ref.schema}")
prompt = f"""Convert this natural language query to SQL:
Query: {natural_language_query}
Available tables and schemas:
{chr(10).join(schema_info)}
Return only the SQL query, no explanation.
"""
response = self.model.generate_content(prompt)
sql_query = response.text.strip()
# Execute query
query_job = self.bq_client.query(sql_query)
results = query_job.result()
return {
"sql": sql_query,
"results": [dict(row) for row in results],
"total_rows": results.total_rows
}
Cloud Storage Integration
from google.cloud import storage
class StorageAgent:
"""Agent with Cloud Storage access"""
def __init__(self, bucket_name: str):
self.storage_client = storage.Client()
self.bucket = self.storage_client.bucket(bucket_name)
async def process_documents(self, prefix: str):
"""Process all documents in a GCS prefix"""
blobs = self.bucket.list_blobs(prefix=prefix)
results = []
for blob in blobs:
if blob.name.endswith(('.pdf', '.txt', '.docx')):
# Download and process
content = blob.download_as_text()
analysis = await self.analyze_document(content)
results.append({
"file": blob.name,
"analysis": analysis
})
return results
Best Practices
1. Security
from google.cloud import secretmanager
class SecureAgent:
"""Agent with security best practices"""
def __init__(self, project_id: str):
self.secret_client = secretmanager.SecretManagerServiceClient()
self.project_id = project_id
def get_secret(self, secret_id: str) -> str:
"""Get secret from Secret Manager"""
name = f"projects/{self.project_id}/secrets/{secret_id}/versions/latest"
response = self.secret_client.access_secret_version(request={"name": name})
return response.payload.data.decode("UTF-8")
def sanitize_input(self, user_input: str) -> str:
"""Sanitize user input"""
# Remove potential injection attempts
sanitized = user_input.replace("```", "")
sanitized = sanitized.replace("<script>", "")
# Add more sanitization as needed
return sanitized
2. Error Handling
from tenacity import retry, stop_after_attempt, wait_exponential
import logging
class ResilientAgent:
"""Agent with robust error handling"""
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=4, max=60)
)
async def process_with_retry(self, request):
"""Process with automatic retry"""
try:
response = await self._process(request)
return response
except Exception as e:
logging.error(f"Processing failed: {e}")
raise
Examples Repository
Complete working examples from Google Cloud repositories:
- Text Generation -
/gemini/function-calling/ - RAG Implementation -
/gemini/use-cases/retrieval-augmented-generation/ - Multi-modal Processing -
/gemini/multimodality/ - Agent Builder -
/agent-builder/ - Production Templates - From
agent-starter-pack
Resources
- Vertex AI Documentation
- Gemini API Reference
- Agent Builder Guide
- GitHub: GoogleCloudPlatform/generative-ai
- GitHub: GoogleCloudPlatform/agent-starter-pack
Version: 1.0.0 Last Updated: October 2025 Author: Jeremy Longshore Based on: Official Google Cloud repositories License: MIT
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
creating-opencode-plugins
元该Skill为开发者创建OpenCode插件提供指导,涵盖命令、文件、LSP等25+种事件类型。它详细说明了插件结构、事件API规范及JavaScript/TypeScript实现模式,帮助开发者构建事件驱动的模块。适用于需要拦截操作、扩展功能或自定义AI助手行为的插件开发场景。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
evaluating-llms-harness
测试该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。
