SKILL·768F2E

pdf-extractor

Name: pdf-extractor
Author: guia-matthieu

guia-matthieu

Обновлено 1 month ago

10 просмотров

134

Посмотреть на GitHub

Документацияpdfpowerpointdata

О программе

pdf-extractor извлекает текст, таблицы и изображения из PDF-файлов, позволяя разработчикам преобразовывать статические документы в пригодные для использования данные. Он идеально подходит для обработки отчётов, конвертации таблиц в CSV, извлечения изображений из презентаций и пакетного преобразования текста. Этот навык использует pdfplumber для превращения содержимого PDF в структурированные, пригодные для обработки форматы.

Быстрая установка

Claude Code

Рекомендуется

Основной

npx skills add guia-matthieu/clawfu-skills -a claude-code

Команда плагинаАльтернативный

/plugin add https://github.com/guia-matthieu/clawfu-skills

Git клонированиеАльтернативный

git clone https://github.com/guia-matthieu/clawfu-skills.git ~/.claude/skills/pdf-extractor

Скопируйте и вставьте эту команду в Claude Code для установки этого навыка

Документация

PDF Extractor

Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

When to Use This Skill

Report processing - Extract data from PDF reports
Table extraction - Convert PDF tables to CSV
Image collection - Pull images from presentations
Text mining - Bulk convert PDFs to searchable text
Research - Process academic papers and whitepapers

What Claude Does vs What You Decide

Claude Does	You Decide
Structures analysis frameworks	Metric definitions
Identifies patterns in data	Business interpretation
Creates visualization templates	Dashboard design
Suggests optimization areas	Action priorities
Calculates statistical measures	Decision thresholds

Dependencies

pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillow

Commands

Extract Text

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

Extract Tables

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

Extract Images

python scripts/main.py images presentation.pdf --output ./images/

Merge PDFs

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

PDF Info

python scripts/main.py info document.pdf

Examples

Example 1: Extract Financial Tables

python scripts/main.py tables annual-report.pdf --output financials.csv

# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Example 2: Batch Convert to Text

python scripts/main.py batch ./pdfs/ --output ./text/

# Converts all PDFs in folder to .txt files

Example 3: Extract Specific Pages

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

# Extracts only pages 1, 5-10, and 15

Skill Boundaries

What This Skill Does Well

Structuring data analysis
Identifying patterns and trends
Creating visualization frameworks
Calculating statistical measures

What This Skill Cannot Do

Access your actual data
Replace statistical expertise
Make business decisions
Guarantee prediction accuracy

Related Skills

web-scraper - Scrape web content
content-repurposer - Repurpose extracted content

Skill Metadata

Mode: centaur

category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week

GitHub репозиторий

guia-matthieu/clawfu-skills

Путь: skills/automation/pdf-extractor

ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

FAQ

Frequently asked questions

What is the pdf-extractor skill?

pdf-extractor is a Claude Skill by guia-matthieu. Skills package instructions and resources that Claude loads on demand, so Claude can perform pdf-extractor-related tasks without extra prompting.

How do I install pdf-extractor?

Use the install commands on this page: add pdf-extractor to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does pdf-extractor belong to?

pdf-extractor is in the Documentation category, tagged pdf, powerpoint and data.

Is pdf-extractor free to use?

Yes. pdf-extractor is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.