Back to Skills

web-scraper

guia-matthieu
Updated 2 days ago
5 views
111
20
111
View on GitHub
Developmentapidata

About

This skill extracts structured data from websites using BeautifulSoup and requests, turning webpages into usable data. It's designed for tasks like collecting competitor pricing, scraping product listings, and extracting contact information. Developers can use it for lead generation, content audits, and monitoring website changes.

Quick Install

Claude Code

Recommended
Primary
npx skills add guia-matthieu/clawfu-skills -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/guia-matthieu/clawfu-skills
Git CloneAlternative
git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/web-scraper

Copy and paste this command in Claude Code to install this skill

Documentation

Web Scraper

Extract structured data from websites using BeautifulSoup and requests - turn any webpage into usable data.

When to Use This Skill

  • Competitor research - Scrape pricing, features, positioning
  • Lead generation - Extract contact info from directories
  • Content audit - Pull headings, links, meta data
  • Price monitoring - Track competitor pricing changes
  • Data collection - Gather research data from multiple sources

What Claude Does vs What You Decide

Claude DoesYou Decide
Structures analysis frameworksStrategic priorities
Synthesizes market dataCompetitive positioning
Identifies opportunitiesResource allocation
Creates strategic optionsFinal strategy selection
Suggests implementation approachesExecution decisions

Dependencies

pip install beautifulsoup4 requests pandas click lxml

Commands

Scrape Elements

python scripts/main.py scrape https://example.com --selector "h1,h2,p"
python scripts/main.py scrape https://example.com --selector ".product-price"

Extract Links

python scripts/main.py links https://example.com
python scripts/main.py links https://example.com --internal-only

Extract Emails

python scripts/main.py emails https://example.com
python scripts/main.py emails https://example.com --depth 2

Extract Structured Data

python scripts/main.py structured https://example.com/article --schema article
python scripts/main.py structured https://example.com/product --schema product

Examples

Example 1: Scrape Competitor Pricing

python scripts/main.py scrape https://competitor.com/pricing --selector ".price,.plan-name"

# Output:
# Extracted 6 elements
# 1. Starter - $29/mo
# 2. Pro - $99/mo
# 3. Enterprise - Contact us

Example 2: Extract Article Content

python scripts/main.py structured https://blog.example.com/post --schema article

# Output: article_data.json
# {
#   "title": "How to Scale Your Startup",
#   "author": "Jane Doe",
#   "date": "2024-01-15",
#   "content": "...",
#   "word_count": 1523
# }

CSS Selector Reference

SelectorDescriptionExample
tagElement typeh1, p, div
.classClass name.price, .title
#idElement ID#main-content
tag.classTag with classdiv.product
tag[attr]Has attributea[href]
parent > childDirect childul > li
tag1, tag2Multipleh1, h2, h3

Ethical Scraping Guidelines

  1. Check robots.txt - Respect site's scraping policy
  2. Rate limit - Don't overload servers (1-2 req/sec)
  3. Identify yourself - Use descriptive User-Agent
  4. Cache requests - Don't re-scrape unchanged pages
  5. Terms of Service - Check if scraping is allowed

Skill Boundaries

What This Skill Does Well

  • Structuring strategic analysis
  • Identifying market opportunities
  • Creating strategic frameworks
  • Synthesizing competitive data

What This Skill Cannot Do

  • Replace market research
  • Guarantee strategic success
  • Know proprietary competitor info
  • Make executive decisions

Related Skills

Skill Metadata

  • Mode: centaur
category: automation
subcategory: data-extraction
dependencies: [beautifulsoup4, requests, pandas]
difficulty: intermediate
time_saved: 5+ hours/week

GitHub Repository

guia-matthieu/clawfu-skills
Path: skills/automation/web-scraper
0
ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

Related Skills

qmd

Development

qmd is a local search and indexing CLI tool that enables developers to index and search through local files using hybrid search combining BM25, vector embeddings, and reranking. It supports both command-line usage and MCP (Model Context Protocol) mode for integration with Claude. The tool uses Ollama for embeddings and stores indexes locally, making it ideal for searching documentation or codebases directly from the terminal.

View skill

subagent-driven-development

Development

This skill executes implementation plans by dispatching a fresh subagent for each independent task, with code review between tasks. It enables fast iteration while maintaining quality gates through this review process. Use it when working on mostly independent tasks within the same session to ensure continuous progress with built-in quality checks.

View skill

mcporter

Development

The mcporter skill enables developers to manage and call Model Context Protocol (MCP) servers directly from Claude. It provides commands to list available servers, call their tools with arguments, and handle authentication and daemon lifecycle. Use this skill for integrating and testing MCP server functionality in your development workflow.

View skill

adk-deployment-specialist

Development

This skill deploys and orchestrates Vertex AI ADK agents using A2A protocol, managing AgentCard discovery, task submission, and supporting tools like Code Execution Sandbox and Memory Bank. It enables building multi-agent systems with sequential, parallel, or loop orchestration patterns in Python, Java, or Go. Use it when asked to deploy ADK agents or orchestrate agent workflows on Google Cloud.

View skill