返回技能列表

web-scraper

guia-matthieu
更新于 2 days ago
8 次查看
111
20
111
在 GitHub 上查看
开发apidata

关于

This skill extracts structured data from websites using BeautifulSoup and requests, turning webpages into usable data. It's designed for tasks like collecting competitor pricing, scraping product listings, and extracting contact information. Developers can use it for lead generation, content audits, and monitoring website changes.

快速安装

Claude Code

推荐
主要方式
npx skills add guia-matthieu/clawfu-skills -a claude-code
插件命令备选方式
/plugin add https://github.com/guia-matthieu/clawfu-skills
Git 克隆备选方式
git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/web-scraper

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

Web Scraper

Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data.

When to Use This Skill

  • Competitor research - Scrape pricing, features, positioning
  • Lead generation - Extract contact info from directories
  • Content audit - Pull headings, links, meta data
  • Price monitoring - Track competitor pricing changes
  • Data collection - Gather research data from multiple sources

What Claude Does vs What You Decide

Claude DoesYou Decide
Structures analysis frameworksStrategic priorities
Synthesizes market dataCompetitive positioning
Identifies opportunitiesResource allocation
Creates strategic optionsFinal strategy selection
Suggests implementation approachesExecution decisions

Dependencies

pip install beautifulsoup4 requests pandas click lxml

Commands

Scrape Elements

python scripts/main.py scrape https://example.com --selector "h1,h2,p"
python scripts/main.py scrape https://example.com --selector ".product-price"

Extract Links

python scripts/main.py links https://example.com
python scripts/main.py links https://example.com --internal-only

Extract Emails

python scripts/main.py emails https://example.com
python scripts/main.py emails https://example.com --depth 2

Extract Structured Data

python scripts/main.py structured https://example.com/article --schema article
python scripts/main.py structured https://example.com/product --schema product

Examples

Example 1: Scrape Competitor Pricing

python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"

# Output:
# Extracted 6 elements
# 1. Starter - $29/mo
# 2. Pro - $99/mo
# 3. Enterprise - Contact us

Example 2: Extract Article Content

python scripts/main.py structured https://blog.example.com/post --schema article

# Output: article_data.json
# {
#   "title": "How to Scale Your Startup",
#   "author": "Jane Doe",
#   "date": "2024-01-15",
#   "content": "...",
#   "word_count": 1523
# }

CSS Selector Reference

SelectorDescriptionExample
tagElement typeh1, p, div
.classClass name.price, .title
#idElement ID#main-content
tag.classTag with classdiv.product
tag[attr]Has attributea[href]
parent > childDirect childul > li
tag1, tag2Multipleh1, h2, h3

Ethical Scraping Guidelines

  1. Check robots.txt - Respect site's scraping policy
  2. Rate limit - Don't overload servers (1-2 req/sec)
  3. Identify yourself - Use descriptive User-Agent
  4. Cache requests - Don't re-scrape unchanged pages
  5. Terms of Service - Check if scraping is allowed

Skill Boundaries

What This Skill Does Well

  • Structuring strategic analysis
  • Identifying market opportunities
  • Creating strategic frameworks
  • Synthesizing competitive data

What This Skill Cannot Do

  • Replace market research
  • Guarantee strategic success
  • Know proprietary competitor info
  • Make executive decisions

Related Skills

Skill Metadata

  • Mode: centaur
category: automation
subcategory: data-extraction
dependencies: [beautifulsoup4, requests, pandas]
difficulty: intermediate
time_saved: 5+ hours/week

GitHub 仓库

guia-matthieu/clawfu-skills
路径: skills/automation/web-scraper
0
ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

相关推荐技能

qmd

开发

这是一个本地搜索和索引的CLI工具,支持BM25、向量搜索和重排序功能。开发者可以用它快速索引本地文件(如Markdown文档)并进行混合搜索,特别适合代码库或文档的本地检索。它还提供MCP模式,能轻松集成到Claude开发环境中使用。

查看技能

subagent-driven-development

开发

该Skill用于在当前会话中执行包含独立任务的实施计划,它会为每个任务分派一个全新的子代理并在任务间进行代码审查。这种"全新子代理+任务间审查"的模式既能保障代码质量,又能实现快速迭代。适合需要在当前会话中连续执行独立任务,并希望在每个任务后都有质量把关的开发场景。

查看技能

mcporter

开发

mcporter Skill 让开发者能在Claude中直接管理和调用MCP服务器。它支持列出可用服务器、调用工具、处理OAuth认证以及管理服务器守护进程。开发者可以通过命令行式交互快速执行`mcporter list`查看服务器,或使用`mcporter call`直接调用工具,简化了MCP工作流程。

查看技能

adk-deployment-specialist

开发

这是一个用于部署和编排Google Vertex AI ADK智能体的Claude Skill,专为构建生产级多智能体系统而设计。它支持通过A2A协议进行智能体通信,提供代码执行沙箱和记忆库功能,并能处理智能体发现与任务提交。当开发者需要部署ADK智能体或编排多智能体协作时,可使用此Skill来简化Vertex AI Agent Engine的部署流程。

查看技能