csv-excel-merger

OneWave-AI

更新日 2 days ago

61 閲覧

ドキュメント処理exceldata

について

csv-excel-mergerスキルは、複数のCSVまたはExcelファイルをインテリジェントにマージし、列を自動的に照合して異なるスキーマを処理します。データソースを結合する際の重複排除や競合解決といった主要な機能を提供します。開発者がスプレッドシートを統合したり、データエクスポートを結合したり、複数ファイルを単一のデータセットにまとめる必要がある場合に、このスキルをご利用ください。

クイックインストール

Claude Code

推奨

プラグインコマンド推奨

/plugin add https://github.com/OneWave-AI/claude-skills

Git クローン代替

git clone https://github.com/OneWave-AI/claude-skills.git ~/.claude/skills/csv-excel-merger

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

ドキュメント

CSV/Excel Merger

Intelligently merge multiple CSV or Excel files with automatic column matching and data deduplication.

Instructions

When a user needs to merge CSV or Excel files:

Analyze Input Files:
- How many files need to be merged?
- What format (CSV, Excel, TSV)?
- Are the files provided or need to be read from disk?
- Do columns have the same names across files?
- What is the primary key (unique identifier)?
Inspect File Structures:
- Read headers from each file
- Identify column names and data types
- Detect encoding (UTF-8, Latin-1, etc.)
- Check for missing columns
- Look for duplicate column names
Create Merge Strategy:

Column Matching:
- Exact name match: "email" = "email"
- Case-insensitive: "Email" = "email"
- Fuzzy match: "E-mail" ≈ "email"
- Common patterns:
  - "first_name", "firstname", "First Name" → "first_name"
  - "phone", "phone_number", "tel" → "phone"
  - "email", "e-mail", "email_address" → "email"
Conflict Resolution (when same record appears in multiple files):
- Keep first: Use value from first file
- Keep last: Use value from last file (most recent)
- Keep longest: Use most complete value
- Manual review: Flag conflicts for user review
- Merge: Combine non-conflicting fields
Deduplication:
- Identify duplicate rows based on primary key
- Options: keep first, keep last, keep all, merge values
- Track source file for each row

Perform Merge:

# Example merge logic
import pandas as pd

# Read files
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')

# Normalize column names
df1.columns = df1.columns.str.lower().str.strip()
df2.columns = df2.columns.str.lower().str.strip()

# Map similar columns
column_mapping = {
    'firstname': 'first_name',
    'e_mail': 'email',
    # ...
}
df2 = df2.rename(columns=column_mapping)

# Merge
merged = pd.concat([df1, df2], ignore_index=True)

# Deduplicate
merged = merged.drop_duplicates(subset=['email'], keep='last')

# Save
merged.to_csv('merged_output.csv', index=False)

Format Output:

📊 CSV/EXCEL MERGER REPORT

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 INPUT FILES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

File 1: contacts_jan.csv
  Rows: 1,245
  Columns: 8 (name, email, phone, company, ...)

File 2: contacts_feb.csv
  Rows: 987
  Columns: 9 (firstname, lastname, email, mobile, ...)

File 3: leads_export.xlsx
  Rows: 2,103
  Columns: 12 (full_name, email_address, phone, ...)

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔄 COLUMN MAPPING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Unified Schema:
• first_name ← [firstname, first name, fname]
• last_name ← [lastname, last name, lname]
• email ← [email, e-mail, email_address]
• phone ← [phone, mobile, phone_number, tel]
• company ← [company, organization, org]
• title ← [title, job_title, position]
• source ← [file origin tracking]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 MERGE ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Total rows before merge: 4,335
Duplicate records found: 892
Conflicts detected: 47

Deduplication Strategy: Keep most recent (by source file date)
Primary Key: email

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ CONFLICTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Record: [email protected]
  File 1 phone: (555) 123-4567
  File 2 phone: (555) 987-6543
  Resolution: Kept most recent (File 2)

[List top 10 conflicts]

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ MERGE RESULTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Output File: merged_contacts.csv
Total Rows: 3,443
Columns: 7
Duplicates Removed: 892

Breakdown by Source:
• contacts_jan.csv: 1,245 rows (398 unique)
• contacts_feb.csv: 987 rows (521 unique)
• leads_export.xlsx: 2,103 rows (2,524 unique)

Data Quality:
• Email completeness: 98.2%
• Phone completeness: 87.5%
• Company completeness: 91.3%

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 RECOMMENDATIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

• Review 47 conflict records manually
• Standardize phone number format
• Fill missing company names (8.7% incomplete)
• Export conflicts to: conflicts_review.csv

Handle Special Cases:

Multiple Primary Keys:
- Use compound keys: (email + company)
- Offer options when ambiguous
Different Data Types:
- Convert dates to standard format
- Normalize phone numbers
- Standardize country codes
- Clean whitespace and casing
Missing Columns:
- Fill with empty values
- Flag missing data
- Offer to create new columns
Large Files:
- Use chunking for files > 100MB
- Show progress indicator
- Estimate memory usage
Generate Code: Provide Python/pandas script that:
- Reads all files
- Performs intelligent column matching
- Deduplicates based on strategy
- Resolves conflicts
- Saves merged output
- Generates detailed report
Export Options:
- CSV (UTF-8)
- Excel (.xlsx)
- JSON
- SQL INSERT statements
- Parquet (for large datasets)

Example Triggers

"Merge these three CSV files"
"Combine multiple Excel sheets into one file"
"Deduplicate and merge customer data"
"Join spreadsheets with different column names"
"Consolidate contact lists from different sources"

Best Practices

Column Matching:

Use fuzzy matching for similar names
Maintain original column name mapping report
Allow manual override of auto-matching

Data Quality:

Trim whitespace
Standardize formats (phone, email, dates)
Detect and flag invalid data
Preserve data types

Performance:

Use chunking for large files
Process in batches
Show progress for long operations
Optimize memory usage

Transparency:

Log all merge decisions
Track source file for each row
Report conflicts and resolutions
Generate detailed merge report

Output Quality

Ensure merges:

Intelligently match columns
Handle different schemas
Deduplicate properly
Preserve data integrity
Flag conflicts for review
Generate comprehensive report
Maintain data quality
Track data lineage (source)
Handle edge cases gracefully
Provide validation statistics

Generate clean, deduplicated merged files with full transparency and data quality checks.

GitHub リポジトリ

OneWave-AI/claude-skills

パス: csv-excel-merger

csv-excel-merger

について

クイックインストール

Claude Code

ドキュメント

CSV/Excel Merger

Instructions

Example Triggers

Best Practices

Output Quality

GitHub リポジトリ

関連スキル

content-collections

polymarket

cloudflare-turnstile

hybrid-cloud-networking