SKILL·768F2E

pdf-extractor

guia-matthieu
Updated 1 month ago
9 views
128
26
128
View on GitHub
Documentationpdfpowerpointdata

About

pdf-extractor extracts text, tables, and images from PDF files, enabling developers to convert static documents into usable data. It's ideal for processing reports, converting tables to CSV, pulling images from presentations, and batch text conversion. This skill uses pdfplumber to turn PDF content into structured, actionable formats.

Quick Install

Claude Code

Recommended
Primary
npx skills add guia-matthieu/clawfu-skills -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/guia-matthieu/clawfu-skills
Git CloneAlternative
git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/pdf-extractor

Copy and paste this command in Claude Code to install this skill

Documentation

PDF Extractor

Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

When to Use This Skill

  • Report processing - Extract data from PDF reports
  • Table extraction - Convert PDF tables to CSV
  • Image collection - Pull images from presentations
  • Text mining - Bulk convert PDFs to searchable text
  • Research - Process academic papers and whitepapers

What Claude Does vs What You Decide

Claude DoesYou Decide
Structures analysis frameworksMetric definitions
Identifies patterns in dataBusiness interpretation
Creates visualization templatesDashboard design
Suggests optimization areasAction priorities
Calculates statistical measuresDecision thresholds

Dependencies

pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillow

Commands

Extract Text

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

Extract Tables

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

Extract Images

python scripts/main.py images presentation.pdf --output ./images/

Merge PDFs

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

PDF Info

python scripts/main.py info document.pdf

Examples

Example 1: Extract Financial Tables

python scripts/main.py tables annual-report.pdf --output financials.csv

# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Example 2: Batch Convert to Text

python scripts/main.py batch ./pdfs/ --output ./text/

# Converts all PDFs in folder to .txt files

Example 3: Extract Specific Pages

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

# Extracts only pages 1, 5-10, and 15

Skill Boundaries

What This Skill Does Well

  • Structuring data analysis
  • Identifying patterns and trends
  • Creating visualization frameworks
  • Calculating statistical measures

What This Skill Cannot Do

  • Access your actual data
  • Replace statistical expertise
  • Make business decisions
  • Guarantee prediction accuracy

Related Skills

Skill Metadata

  • Mode: centaur
category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week

GitHub Repository

guia-matthieu/clawfu-skills
Path: skills/automation/pdf-extractor
0
ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server
FAQ

Frequently asked questions

What is the pdf-extractor skill?

pdf-extractor is a Claude Skill by guia-matthieu. Skills package instructions and resources that Claude loads on demand, so Claude can perform pdf-extractor-related tasks without extra prompting.

How do I install pdf-extractor?

Use the install commands on this page: add pdf-extractor to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does pdf-extractor belong to?

pdf-extractor is in the Documentation category, tagged pdf, powerpoint and data.

Is pdf-extractor free to use?

Yes. pdf-extractor is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

Related Skills

railway-docs
Documentation

This skill fetches current Railway documentation to answer questions about features, functionality, or specific docs URLs. It ensures developers receive accurate, up-to-date information directly from Railway's official sources. Use it when users ask how Railway works or reference Railway documentation.

View skill
n8n-code-python
Documentation

This Claude Skill provides expert guidance for writing Python code in n8n's Code nodes, specifically for using Python's standard library and working with n8n's special syntax like `_input`, `_json`, and `_node`. It helps developers understand Python's limitations within n8n and recommends using JavaScript for most workflows while offering Python solutions for specific data transformation needs.

View skill
archon
Documentation

The Archon skill provides RAG-powered semantic search and project management through a REST API. Use it for querying documentation, managing hierarchical projects/tasks, and performing knowledge retrieval with document upload capabilities. Always prioritize Archon first when searching external documentation before using other sources.

View skill
n8n-code-javascript
Documentation

This Claude Skill provides expert guidance for writing JavaScript code in n8n's Code nodes. It covers essential n8n-specific syntax like `$input`/`$json` variables, HTTP helpers, and DateTime handling, while troubleshooting common errors. Use it when developing n8n workflows that require custom JavaScript processing in Code nodes.

View skill