SKILL·768F2E

pdf-extractor

Name: pdf-extractor
Author: guia-matthieu

guia-matthieu

업데이트됨 1 month ago

10 조회

134

GitHub에서 보기

문서pdfpowerpointdata

정보

pdf-extractor는 PDF 파일에서 텍스트, 표, 이미지를 추출하여 개발자가 정적 문서를 사용 가능한 데이터로 변환할 수 있도록 합니다. 보고서 처리, 표를 CSV로 변환, 프레젠테이션에서 이미지 추출, 일괄 텍스트 변환에 이상적입니다. 이 스킬은 pdfplumber를 사용하여 PDF 콘텐츠를 구조화된 실행 가능한 형식으로 전환합니다.

빠른 설치

Claude Code

문서

PDF Extractor

Extract text, tables, and images from PDF files using pdfplumber - turn static PDFs into usable data.

When to Use This Skill

Report processing - Extract data from PDF reports
Table extraction - Convert PDF tables to CSV
Image collection - Pull images from presentations
Text mining - Bulk convert PDFs to searchable text
Research - Process academic papers and whitepapers

What Claude Does vs What You Decide

Claude Does	You Decide
Structures analysis frameworks	Metric definitions
Identifies patterns in data	Business interpretation
Creates visualization templates	Dashboard design
Suggests optimization areas	Action priorities
Calculates statistical measures	Decision thresholds

Dependencies

pip install pdfplumber pypdf click pandas
# For image extraction:
pip install Pillow

Commands

Extract Text

python scripts/main.py text document.pdf
python scripts/main.py text document.pdf --pages 1-5

Extract Tables

python scripts/main.py tables report.pdf --output tables.csv
python scripts/main.py tables financial.pdf --page 3

Extract Images

python scripts/main.py images presentation.pdf --output ./images/

Merge PDFs

python scripts/main.py merge doc1.pdf doc2.pdf --output combined.pdf

PDF Info

python scripts/main.py info document.pdf

Examples

Example 1: Extract Financial Tables

python scripts/main.py tables annual-report.pdf --output financials.csv

# Output: financials.csv with all tables found
# Also creates individual CSVs: table_page3_1.csv, table_page5_1.csv

Example 2: Batch Convert to Text

python scripts/main.py batch ./pdfs/ --output ./text/

# Converts all PDFs in folder to .txt files

Example 3: Extract Specific Pages

python scripts/main.py text whitepaper.pdf --pages 1,5-10,15

# Extracts only pages 1, 5-10, and 15

Skill Boundaries

What This Skill Does Well

Structuring data analysis
Identifying patterns and trends
Creating visualization frameworks
Calculating statistical measures

What This Skill Cannot Do

Access your actual data
Replace statistical expertise
Make business decisions
Guarantee prediction accuracy

Related Skills

web-scraper - Scrape web content
content-repurposer - Repurpose extracted content

Skill Metadata

Mode: centaur

category: automation
subcategory: document-processing
dependencies: [pdfplumber, pypdf, pandas]
difficulty: beginner
time_saved: 4+ hours/week

GitHub 저장소

guia-matthieu/clawfu-skills

경로: skills/automation/pdf-extractor

ai-skillsanthropicclaude-codeclaude-skillsmarketingmcp-server

FAQ

Frequently asked questions

What is the pdf-extractor skill?

pdf-extractor is a Claude Skill by guia-matthieu. Skills package instructions and resources that Claude loads on demand, so Claude can perform pdf-extractor-related tasks without extra prompting.

How do I install pdf-extractor?

Use the install commands on this page: add pdf-extractor to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does pdf-extractor belong to?

pdf-extractor is in the Documentation category, tagged pdf, powerpoint and data.

Is pdf-extractor free to use?

Yes. pdf-extractor is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

railway-docs

문서

이 스킬은 Railway의 기능, 작동 방식 또는 특정 문서 URL에 대한 질문에 답하기 위해 최신 Railway 문서를 가져옵니다. 개발자들이 Railway의 공식 소스로부터 정확하고 최신 정보를 직접 받을 수 있도록 보장합니다. 사용자가 Railway의 작동 방식을 묻거나 Railway 문서를 참조할 때 사용하세요.

스킬 보기

n8n-code-python

문서

이 Claude Skill은 n8n의 Code 노드에서 Python 코드를 작성할 때 전문적인 지침을 제공하며, 특히 Python 표준 라이브러리 사용과 n8n의 특수 구문인 `_input`, `_json`, `_node` 작업에 중점을 둡니다. 이는 개발자가 n8n 내에서 Python의 제한 사항을 이해하도록 돕고, 대부분의 워크플로에는 JavaScript 사용을 권장하면서도 특정 데이터 변환 요구사항에 대한 Python 솔루션을 제안합니다.

스킬 보기

archon

문서

Archon 스킬은 REST API를 통해 RAG 기반 시맨틱 검색과 프로젝트 관리를 제공합니다. 이 스킬을 사용하여 문서 검색, 계층적 프로젝트/태스크 관리, 문서 업로드 기능을 갖춘 지식 검색을 수행할 수 있습니다. 외부 문서를 검색할 때는 다른 소스를 사용하기 전에 항상 Archon을 최우선으로 활용하세요.

스킬 보기

n8n-code-javascript

문서

이 Claude Skill은 n8n의 Code 노드에서 JavaScript 코드 작성에 대한 전문적인 지침을 제공합니다. `$input`/`$json` 변수, HTTP 헬퍼, DateTime 처리와 같은 필수적인 n8n 특정 구문을 다루며 일반적인 오류를 해결합니다. Code 노드에서 사용자 정의 JavaScript 처리가 필요한 n8n 워크플로우를 개발할 때 활용하세요.

스킬 보기