Browser Automation
About
This skill enables developers to automate web browser interactions through natural language CLI commands. It allows for website navigation, data extraction, form filling, clicking elements, and taking screenshots using the Stagehand CLI with Chrome. Use it when you need to programmatically control a browser for web scraping, testing, or automation tasks.
Quick Install
Claude Code
Recommended/plugin add https://github.com/greyhaven-ai/claude-code-configgit clone https://github.com/greyhaven-ai/claude-code-config.git ~/.claude/skills/Browser AutomationCopy and paste this command in Claude Code to install this skill
Documentation
Browser Automation
Automate browser interactions using Stagehand CLI with Claude. This skill provides natural language control over a Chrome browser through command-line tools for navigation, interaction, data extraction, and screenshots.
Overview
This skill uses a CLI-based approach where Claude Code calls browser automation commands via bash. The browser stays open between commands for faster sequential operations and preserves browser state (cookies, sessions, etc.).
Setup Verification
IMPORTANT: Before using any browser commands, you MUST check setup.json in this directory.
First-Time Setup Check
- Read
setup.json(located in.claude/skills/browser-automation/setup.json) - Check
setupCompletefield:- If
true: All prerequisites are met, proceed with browser commands - If
false: Setup required - follow the steps below
- If
If Setup is Required (setupComplete: false)
Run these commands in the plugin directory:
# 1. Install dependencies and build (REQUIRED)
# This automatically builds TypeScript
npm install
# or: pnpm install
# or: bun install
# 2. Link the browser command globally (REQUIRED)
npm link
# 3. Configure API key (REQUIRED)
# Option 1 (RECOMMENDED): Export in your terminal
export ANTHROPIC_API_KEY="your-api-key-here"
# Option 2: Or use .env file
cp .env.example .env
# Then edit .env and add: ANTHROPIC_API_KEY="your-api-key-here"
# 4. Verify Chrome is installed
# Chrome should be at standard location for your OS
# 5. Test the installation
browser navigate https://example.com
# 6. If test succeeds, update setup.json
# Set all "installed"/"configured" fields to true
# Set "setupComplete" to true
Prerequisites Summary
- ✅ Google Chrome installed on your system
- ✅ Node.js dependencies installed and TypeScript built (
npm installruns build automatically) - ✅ Browser command globally available (
npm linkcreates the global symlink) - ✅ Anthropic API key configured (exported as
ANTHROPIC_API_KEYenvironment variable or in.envfile)
DO NOT attempt to use browser commands if setupComplete: false in setup.json. Guide the user through setup first.
Available Commands
Navigate to URLs
browser navigate <url>
When to use: Opening any website, loading a specific URL, going to a web page.
Example usage:
browser navigate https://example.combrowser navigate https://news.ycombinator.com
Output: JSON with success status, message, and screenshot path
Interact with Pages
browser act "<action>"
When to use: Clicking buttons, filling forms, scrolling, selecting options, typing text.
Example usage:
browser act "click the Sign In button"browser act "fill in the email field with [email protected]"browser act "scroll down to the footer"browser act "type 'laptop' in the search box and press enter"
Important: Be as specific as possible - details make a world of difference. When filling fields, you don't need to combine 'click and type'; the tool will perform a fill similar to Playwright's fill function.
Output: JSON with success status, message, and screenshot path
Extract Data
browser extract "<instruction>" ['{"field": "type"}']
When to use: Scraping data, getting specific information, collecting structured content.
Schema format (optional): JSON object where keys are field names and values are types:
"string"for text"number"for numeric values"boolean"for true/false values
Note: The schema parameter is optional. If omitted or if schema validation fails, extraction will proceed without type validation.
Example usage:
browser extract "get the product title and price" '{"title": "string", "price": "number"}'browser extract "get all article headlines" '{"headlines": "string"}'browser extract "get the page title"(no schema)
Output: JSON with success status, extracted data, and screenshot path
Discover Elements
browser observe "<query>"
When to use: Understanding page structure, finding what's clickable, discovering form fields.
Example usage:
browser observe "find all clickable buttons"browser observe "find all form fields"browser observe "find all navigation links"
Output: JSON with success status, discovered elements, and screenshot path
Take Screenshots
browser screenshot
When to use: Visual verification, documenting page state, debugging, creating records.
Notes:
- Screenshots are saved to the plugin directory's
agent/browser_screenshots/folder - Images larger than 2000x2000 pixels are automatically resized
- Filename includes timestamp for uniqueness
Output: JSON with success status and screenshot path
Clean Up
browser close
When to use: After completing all browser interactions, to free up resources.
Output: JSON with success status and message
Browser Behavior
Persistent Browser: The browser stays open between commands for faster sequential operations and to preserve browser state (cookies, sessions, etc.).
Reuse Existing: If Chrome is already running on port 9222, it will reuse that instance.
Minimized Launch: Chrome opens off-screen (position -9999,-9999) to avoid disrupting workflow.
Safe Cleanup: The browser only closes when you explicitly call the close command.
Best Practices
- Always navigate first: Before interacting with a page, navigate to the URL
- 📸 Always view screenshots: After each command (navigate, act, extract, observe), use the Read tool to view the screenshot and verify the command worked correctly
- Use natural language: Describe actions as you would instruct a human
- Extract with clear schemas: Define field names and types explicitly in JSON
- Handle errors gracefully: Check the
successfield in JSON output; if an action fails, view the screenshot and try usingobserveto understand the page better - Close when done: Always clean up browser resources after completing tasks
- Be specific: Use precise selectors in natural language ("the blue Submit button" vs "the button")
- Chain commands: Run multiple commands sequentially without reopening the browser
Common Patterns
Simple browsing task
browser navigate https://example.com
browser act "click the login button"
browser screenshot
browser close
Data extraction task
browser navigate https://example.com/products
browser act "wait for page to load"
browser extract "get all products" '{"name": "string", "price": "number"}'
# Or without schema:
# browser extract "get the page content"
browser close
Multi-step interaction
browser navigate https://example.com/login
browser act "fill in email with [email protected]"
browser act "fill in password with mypassword"
browser act "click the submit button"
browser screenshot
browser close
Debugging workflow
browser navigate https://example.com
browser screenshot
browser observe "find all buttons"
browser act "click the specific button"
browser screenshot
browser close
Troubleshooting
Page not loading: Wait a few seconds after navigation before acting. You can explicitly: browser act "wait for the page to fully load"
Element not found: Use observe to discover what elements are actually available on the page
Action fails: Be more specific in natural language description. Instead of "click the button", try "click the blue Submit button in the form"
Screenshots missing: Check the plugin directory's agent/browser_screenshots/ folder for saved files
Chrome not found: Install Google Chrome or the CLI will show an error with installation instructions
Port 9222 in use: Another Chrome debugging session is running. Close it or wait for timeout
For detailed examples, see EXAMPLES.md. For API reference and technical details, see REFERENCE.md.
Dependencies
To use this skill, install these dependencies only if they aren't already present:
npm install
# or
pnpm install
# or
bun install
GitHub Repository
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
content-collections
MetaThis skill provides a production-tested setup for Content Collections, a TypeScript-first tool that transforms Markdown/MDX files into type-safe data collections with Zod validation. Use it when building blogs, documentation sites, or content-heavy Vite + React applications to ensure type safety and automatic content validation. It covers everything from Vite plugin configuration and MDX compilation to deployment optimization and schema validation.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
