web-scraper
关于
This skill extracts structured data from websites using BeautifulSoup and requests, turning webpages into usable data. It's designed for tasks like collecting competitor pricing, scraping product listings, and extracting contact information. Developers can use it for lead generation, content audits, and monitoring website changes.
快速安装
Claude Code
推荐npx skills add guia-matthieu/clawfu-skills -a claude-code/plugin add https://github.com/guia-matthieu/clawfu-skillsgit clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/web-scraper在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Web Scraper
Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data.
When to Use This Skill
- Competitor research - Scrape pricing, features, positioning
- Lead generation - Extract contact info from directories
- Content audit - Pull headings, links, meta data
- Price monitoring - Track competitor pricing changes
- Data collection - Gather research data from multiple sources
What Claude Does vs What You Decide
| Claude Does | You Decide |
|---|---|
| Structures analysis frameworks | Strategic priorities |
| Synthesizes market data | Competitive positioning |
| Identifies opportunities | Resource allocation |
| Creates strategic options | Final strategy selection |
| Suggests implementation approaches | Execution decisions |
Dependencies
pip install beautifulsoup4 requests pandas click lxml
Commands
Scrape Elements
python scripts/main.py scrape https://example.com --selector "h1,h2,p"
python scripts/main.py scrape https://example.com --selector ".product-price"
Extract Links
python scripts/main.py links https://example.com
python scripts/main.py links https://example.com --internal-only
Extract Emails
python scripts/main.py emails https://example.com
python scripts/main.py emails https://example.com --depth 2
Extract Structured Data
python scripts/main.py structured https://example.com/article --schema article
python scripts/main.py structured https://example.com/product --schema product
Examples
Example 1: Scrape Competitor Pricing
python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"
# Output:
# Extracted 6 elements
# 1. Starter - $29/mo
# 2. Pro - $99/mo
# 3. Enterprise - Contact us
Example 2: Extract Article Content
python scripts/main.py structured https://blog.example.com/post --schema article
# Output: article_data.json
# {
# "title": "How to Scale Your Startup",
# "author": "Jane Doe",
# "date": "2024-01-15",
# "content": "...",
# "word_count": 1523
# }
CSS Selector Reference
| Selector | Description | Example |
|---|---|---|
tag | Element type | h1, p, div |
.class | Class name | .price, .title |
#id | Element ID | #main-content |
tag.class | Tag with class | div.product |
tag[attr] | Has attribute | a[href] |
parent > child | Direct child | ul > li |
tag1, tag2 | Multiple | h1, h2, h3 |
Ethical Scraping Guidelines
- Check robots.txt - Respect site's scraping policy
- Rate limit - Don't overload servers (1-2 req/sec)
- Identify yourself - Use descriptive User-Agent
- Cache requests - Don't re-scrape unchanged pages
- Terms of Service - Check if scraping is allowed
Skill Boundaries
What This Skill Does Well
- Structuring strategic analysis
- Identifying market opportunities
- Creating strategic frameworks
- Synthesizing competitive data
What This Skill Cannot Do
- Replace market research
- Guarantee strategic success
- Know proprietary competitor info
- Make executive decisions
Related Skills
- competitor-monitor - Monitor competitor changes
- pdf-extractor - Extract from PDFs
Skill Metadata
- Mode: centaur
category: automation
subcategory: data-extraction
dependencies: [beautifulsoup4, requests, pandas]
difficulty: intermediate
time_saved: 5+ hours/week
GitHub 仓库
相关推荐技能
qmd
开发这是一个本地搜索和索引的CLI工具,支持BM25、向量搜索和重排序功能。开发者可以用它快速索引本地文件(如Markdown文档)并进行混合搜索,特别适合代码库或文档的本地检索。它还提供MCP模式,能轻松集成到Claude开发环境中使用。
subagent-driven-development
开发该Skill用于在当前会话中执行包含独立任务的实施计划,它会为每个任务分派一个全新的子代理并在任务间进行代码审查。这种"全新子代理+任务间审查"的模式既能保障代码质量,又能实现快速迭代。适合需要在当前会话中连续执行独立任务,并希望在每个任务后都有质量把关的开发场景。
mcporter
开发mcporter Skill 让开发者能在Claude中直接管理和调用MCP服务器。它支持列出可用服务器、调用工具、处理OAuth认证以及管理服务器守护进程。开发者可以通过命令行式交互快速执行`mcporter list`查看服务器,或使用`mcporter call`直接调用工具,简化了MCP工作流程。
adk-deployment-specialist
开发这是一个用于部署和编排Google Vertex AI ADK智能体的Claude Skill,专为构建生产级多智能体系统而设计。它支持通过A2A协议进行智能体通信,提供代码执行沙箱和记忆库功能,并能处理智能体发现与任务提交。当开发者需要部署ADK智能体或编排多智能体协作时,可使用此Skill来简化Vertex AI Agent Engine的部署流程。
