gemini-image-gen
关于
This Claude Skill provides a guide for implementing Google Gemini API image generation using the gemini-2.5-flash-image model. It enables developers to create high-quality images from text prompts and supports image editing, multi-image composition, and iterative refinement. Use it when building text-to-image features or generating visual content for projects.
技能文档
Gemini Image Generation Skill
Generate high-quality images using Google's Gemini 2.5 Flash Image model with text prompts, image editing, and multi-image composition capabilities.
When to Use This Skill
Use this skill when you need to:
- Generate images from text descriptions
- Edit existing images by adding/removing elements or changing styles
- Combine multiple source images into new compositions
- Iteratively refine images through conversational editing
- Create visual content for documentation, design, or creative projects
Prerequisites
API Key Setup
The skill supports both Google AI Studio and Vertex AI endpoints.
Option 1: Google AI Studio (Default)
The skill automatically detects your GEMINI_API_KEY in this order:
- Process environment:
export GEMINI_API_KEY="your-key" - Project root:
.env - .claude directory:
.claude/.env - .claude/skills directory:
.claude/skills/.env - Skill directory:
.claude/skills/gemini-image-gen/.env
Get your API key: Visit Google AI Studio
Create .env file with:
GEMINI_API_KEY=your_api_key_here
Option 2: Vertex AI
To use Vertex AI instead:
# Enable Vertex AI
export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1
Or in .env file:
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1
Python Setup
Install required package:
pip install google-genai
Quick Start
Basic Text-to-Image Generation
from google import genai
from google.genai import types
import os
# API key detection handled automatically by helper script
client = genai.Client(api_key=os.getenv('GEMINI_API_KEY'))
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents='A serene mountain landscape at sunset with snow-capped peaks',
config=types.GenerateContentConfig(
response_modalities=['image'],
aspect_ratio='16:9'
)
)
# Save to ./docs/assets/
for i, part in enumerate(response.candidates[0].content.parts):
if part.inline_data:
with open(f'./docs/assets/generated-{i}.png', 'wb') as f:
f.write(part.inline_data.data)
Using the Helper Script
For convenience, use the provided helper script that handles API key detection and file saving:
# Generate single image
python .claude/skills/gemini-image-gen/scripts/generate.py \
"A futuristic city with flying cars" \
--aspect-ratio 16:9 \
--output ./docs/assets/city.png
# Generate with specific modalities
python .claude/skills/gemini-image-gen/scripts/generate.py \
"Modern architecture design" \
--response-modalities image text \
--aspect-ratio 1:1
Key Features
Aspect Ratios
| Ratio | Resolution | Use Case | Token Cost |
|---|---|---|---|
| 1:1 | 1024×1024 | Social media, avatars | 1290 |
| 16:9 | 1344×768 | Landscapes, banners | 1290 |
| 9:16 | 768×1344 | Mobile, portraits | 1290 |
| 4:3 | 1152×896 | Traditional media | 1290 |
| 3:4 | 896×1152 | Vertical posters | 1290 |
Response Modalities
['image']: Generate only images['text']: Generate only text descriptions['image', 'text']: Generate both images and descriptions
Image Editing
Provide existing image + text instructions to modify:
import PIL.Image
img = PIL.Image.open('original.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Add a red balloon floating in the sky',
img
]
)
Multi-Image Composition
Combine up to 3 source images (recommended):
img1 = PIL.Image.open('background.png')
img2 = PIL.Image.open('foreground.png')
response = client.models.generate_content(
model='gemini-2.5-flash-image',
contents=[
'Combine these images into a cohesive scene',
img1,
img2
]
)
Prompt Engineering Tips
Structure effective prompts with three elements:
- Subject: What to generate ("a robot")
- Context: Environmental setting ("in a futuristic city")
- Style: Artistic treatment ("cyberpunk style, neon lighting")
Example: "A robot in a futuristic city, cyberpunk style with neon lighting and rain-slicked streets"
Quality modifiers:
- Add terms like "4K", "HDR", "high-quality", "professional photography"
- Specify camera settings: "35mm lens", "shallow depth of field", "golden hour lighting"
Text in images:
- Limit to 25 characters maximum
- Use up to 3 distinct phrases
- Specify font styles: "bold sans-serif title" or "handwritten script"
See references/prompting-guide.md for comprehensive prompt engineering strategies.
Safety Settings
The model includes adjustable safety filters. Configure per-request:
config = types.GenerateContentConfig(
response_modalities=['image'],
safety_settings=[
types.SafetySetting(
category=types.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
threshold=types.HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE
)
]
)
See references/safety-settings.md for detailed configuration options.
Output Management
All generated images should be saved to ./docs/assets/ directory:
# Create directory if needed
mkdir -p ./docs/assets
The helper script automatically saves to this location with timestamped filenames.
Model Specifications
Model: gemini-2.5-flash-image
- Input tokens: Up to 65,536
- Output tokens: Up to 32,768
- Supported inputs: Text and images
- Supported outputs: Text and images
- Knowledge cutoff: June 2025
- Features: Image generation, structured outputs, batch API, caching
Limitations
- Maximum 3 input images recommended for best results
- Text rendering works best when generated separately first
- Does not support audio/video inputs
- Regional restrictions on child image uploads (EEA, CH, UK)
- Optimal language support: English, Spanish (Mexico), Japanese, Mandarin, Hindi
Error Handling
Common issues and solutions:
API key not found:
# Check environment variables
echo $GEMINI_API_KEY
# Verify .env file exists
cat .claude/skills/gemini-image-gen/.env
# or
cat .env
Safety filter blocking:
- Review
response.prompt_feedback.block_reason - Adjust safety settings if appropriate for your use case
- Modify prompt to avoid triggering filters
Token limit exceeded:
- Reduce prompt length
- Use fewer input images
- Simplify image editing instructions
Reference Documentation
For detailed information, see:
references/api-reference.md- Complete API specificationsreferences/prompting-guide.md- Advanced prompt engineeringreferences/safety-settings.md- Safety configuration detailsreferences/code-examples.md- Additional implementation examples
Resources
- Official Documentation
- API Reference
- Get API Key
- Google AI Studio - Interactive testing
快速安装
/plugin add https://github.com/Elios-FPT/EliosCodePracticeService/tree/main/gemini-image-gen在 Claude Code 中复制并粘贴此命令以安装该技能
GitHub 仓库
相关推荐技能
evaluating-llms-harness
测试该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。
langchain
元LangChain是一个用于构建LLM应用程序的框架,支持智能体、链和RAG应用开发。它提供多模型提供商支持、500+工具集成、记忆管理和向量检索等核心功能。开发者可用它快速构建聊天机器人、问答系统和自主代理,适用于从原型验证到生产部署的全流程。
project-structure
元这个Skill为开发者提供全面的项目目录结构设计指南和最佳实践。它涵盖了多种项目类型包括monorepo、前后端框架、库和扩展的标准组织结构。帮助团队创建可扩展、易维护的代码架构,特别适用于新项目设计、遗留项目迁移和团队规范制定。
issue-documentation
元该Skill为开发者提供标准化的issue文档模板和指南,适用于创建bug报告、GitHub/Linear/Jira问题等场景。它能系统化地记录问题状况、复现步骤、根本原因、解决方案和影响范围,确保团队沟通清晰高效。通过实施主流问题跟踪系统的最佳实践,帮助开发者生成结构完整的故障排除文档和事件报告。
