スキル一覧に戻る

merge-duplicate-companies

TomGranot
更新日 2 days ago
6 閲覧
33
11
33
GitHubで表示
デザインaiapidesign

について

このスキルは、ドメインと名前に基づいてHubSpot内の重複する企業レコードを特定し、統合プロセスをガイドする監査用CSVをエクスポートします。発見にはHubSpot APIを利用しますが、HubSpotには一括統合APIがないため、実際の統合作業にはサードパーティツールまたは手動でのUI操作が必要です。断片化されたコンタクトや取引データを統合し、データベースの健全性を保つためにご利用ください。

クイックインストール

Claude Code

推奨
メイン
npx skills add TomGranot/hubspot-admin-skills -a claude-code
プラグインコマンド代替
/plugin add https://github.com/TomGranot/hubspot-admin-skills
Git クローン代替
git clone https://github.com/TomGranot/hubspot-admin-skills.git ~/.claude/skills/merge-duplicate-companies

このコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします

ドキュメント

Merge Duplicate Companies

Purpose

Duplicate company records fragment contacts, deals, and engagement history across multiple records for the same real-world company. This leads to inaccurate reporting, broken associations, sales confusion, and workflow failures. This skill identifies duplicates by domain and by name, exports prioritized audit CSVs, and guides the user through merging.

Prerequisites

  • A HubSpot private app access token with crm.objects.companies.read scope
  • Python 3.10+ with uv for package management
  • A .env file containing HUBSPOT_ACCESS_TOKEN
  • Super Admin permissions for merging in the HubSpot UI

Key Constraint

HubSpot has no bulk merge API. Merging must happen one pair at a time through the HubSpot UI or via third-party tools. The API is used for discovery, analysis, and audit trail generation.

HubSpot's built-in Duplicates tool is NOT available on all plan tiers. Check whether the account has access to Settings > Data Management > Duplicates before relying on it.

Execution Pattern

This skill follows a 4-stage execution pattern: Plan -> Before State -> Execute -> After State.

Stage 1: Plan

Before writing any code, confirm with the user:

  1. Confirm intentional duplicates: Ask whether separate records for regional offices of the same company are intentional. If so, exclude those from merging.
  2. Merging is irreversible. Once two company records are merged, they cannot be un-merged. The surviving record inherits all associations, but property values from the deleted record may be lost if both have the same property filled in.
  3. Prioritization strategy: Recommend merging Customer-stage companies first, then Opportunity-stage, then everything else.
  4. Time estimate: This is the most time-consuming process. Budget 2-4 hours for critical duplicates, 8-12 hours total for full cleanup.

Stage 2: Before State

Fetch all companies, identify duplicate groups by domain and name, and export audit CSVs.

"""
Before State: Identify duplicate companies by domain and by name.
Creates CSV audit logs for review before merging.
"""
import os
import csv
import time
import requests
from collections import defaultdict
from dotenv import load_dotenv

load_dotenv()

TOKEN = os.environ["HUBSPOT_ACCESS_TOKEN"]
BASE = "https://api.hubapi.com"
headers = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

# --- Step 1: Fetch all companies ---
print("Fetching all companies...")

all_companies = []
after = None

while True:
    params = {
        "limit": 100,
        "properties": "name,domain,lifecyclestage,num_associated_contacts,"
                       "num_associated_deals,hubspot_owner_id,createdate",
    }
    if after:
        params["after"] = after

    resp = requests.get(
        f"{BASE}/crm/v3/objects/companies",
        headers=headers, params=params,
    )
    if resp.status_code != 200:
        print(f"Stopped at {len(all_companies)} (status {resp.status_code})")
        break

    data = resp.json()
    for company in data.get("results", []):
        props = company.get("properties", {})
        all_companies.append({
            "id": company["id"],
            "name": (props.get("name") or "").strip(),
            "domain": (props.get("domain") or "").strip().lower(),
            "lifecycle_stage": props.get("lifecyclestage", ""),
            "associated_contacts": props.get("num_associated_contacts", "0"),
            "associated_deals": props.get("num_associated_deals", "0"),
            "owner_id": props.get("hubspot_owner_id", ""),
            "createdate": props.get("createdate", ""),
        })

    paging = data.get("paging", {})
    after = paging.get("next", {}).get("after")
    if not after:
        break
    time.sleep(0.05)

print(f"Total companies fetched: {len(all_companies)}")

# --- Step 2: Find duplicates by domain ---
print("\nAnalyzing duplicates by domain...")

domain_groups = defaultdict(list)
for c in all_companies:
    if c["domain"]:
        domain_groups[c["domain"]].append(c)

dup_domain_groups = {d: cs for d, cs in domain_groups.items() if len(cs) > 1}
dup_domain_records = sum(len(cs) for cs in dup_domain_groups.values())

print(f"Unique domains with duplicates: {len(dup_domain_groups)}")
print(f"Total records in duplicate domain groups: {dup_domain_records}")

# Top offenders
sorted_domains = sorted(dup_domain_groups.items(), key=lambda x: len(x[1]), reverse=True)
print("\nTop duplicate domains:")
for domain, companies in sorted_domains[:15]:
    print(f"  {domain}: {len(companies)} records")

# --- Step 3: Find duplicates by name ---
print("\nAnalyzing duplicates by name...")

name_groups = defaultdict(list)
for c in all_companies:
    if c["name"]:
        name_groups[c["name"].lower()].append(c)

dup_name_groups = {n: cs for n, cs in name_groups.items() if len(cs) > 1}
dup_name_records = sum(len(cs) for cs in dup_name_groups.values())

print(f"Unique names with duplicates: {len(dup_name_groups)}")
print(f"Total records in duplicate name groups: {dup_name_records}")

sorted_names = sorted(dup_name_groups.items(), key=lambda x: len(x[1]), reverse=True)
print("\nTop duplicate names:")
for name_lower, companies in sorted_names[:15]:
    print(f"  {companies[0]['name']}: {len(companies)} records")

# --- Step 4: Save CSV audit logs ---
os.makedirs("data/audit-logs", exist_ok=True)

# Domain duplicates CSV
domain_csv = "data/audit-logs/duplicate-companies-by-domain.csv"
with open(domain_csv, "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=[
        "domain", "duplicate_count", "id", "name", "lifecycle_stage",
        "associated_contacts", "associated_deals", "owner_id", "createdate",
    ])
    writer.writeheader()
    for domain, companies in sorted_domains:
        for c in companies:
            writer.writerow({
                "domain": domain,
                "duplicate_count": len(companies),
                **{k: c[k] for k in [
                    "id", "name", "lifecycle_stage", "associated_contacts",
                    "associated_deals", "owner_id", "createdate",
                ]},
            })

print(f"\nDomain duplicates CSV: {domain_csv}")

# Name duplicates CSV
name_csv = "data/audit-logs/duplicate-companies-by-name.csv"
with open(name_csv, "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=[
        "duplicate_name", "duplicate_count", "id", "name", "domain",
        "lifecycle_stage", "associated_contacts", "associated_deals",
        "owner_id", "createdate",
    ])
    writer.writeheader()
    for name_lower, companies in sorted_names:
        for c in companies:
            writer.writerow({
                "duplicate_name": name_lower,
                "duplicate_count": len(companies),
                **{k: c[k] for k in [
                    "id", "name", "domain", "lifecycle_stage",
                    "associated_contacts", "associated_deals",
                    "owner_id", "createdate",
                ]},
            })

print(f"Name duplicates CSV: {name_csv}")

Present findings to the user. Key data points:

  • Total duplicate domain groups and affected records
  • Total duplicate name groups and affected records
  • Top offenders by domain and name
  • CSVs for manual review

Stage 3: Execute

This stage is primarily manual. Guide the user through the merging process.

Option A: HubSpot Built-In Duplicates Tool (if available)

  1. Navigate to Settings > Data Management > Duplicates > Companies
  2. HubSpot shows suggested duplicate pairs ranked by confidence
  3. For each pair, click Review to see side-by-side comparison
  4. Select the "primary" (surviving) record based on:
    • More associated contacts
    • More associated deals
    • More recent activity
    • Has a company owner
    • More complete property data
  5. Click Merge
  6. Process ~50 pairs at a time; HubSpot loads the next batch automatically

Prioritization order:

  1. Customer-stage company duplicates (highest value data)
  2. Opportunity-stage company duplicates
  3. Everything else (Leads, Subscribers)

Option B: Manual search-and-merge for top offenders

For companies with many duplicates (4+ records):

  1. Search for the company by name in Contacts > Companies
  2. Identify the "winner" record (most associations, deals, activity)
  3. Open the winner record > Actions > Merge
  4. Search for the duplicate > select it > choose property values > Merge
  5. Repeat until only one record remains

Option C: Third-party deduplication tools

For large-scale merging, recommend:

  • Dedupely (dedupely.com) -- HubSpot-native integration, bulk merge
  • Insycle (insycle.com) -- Data management platform with dedup
  • Koalify (koalify.com) -- HubSpot duplicate management

These tools can automate bulk merges that would take hours manually.

Prevention: Configure auto-association after merging

Settings > Data Management > Companies (or Settings > Objects > Companies)
Enable: "Create and associate companies with contacts"
Set unique identifier: Company domain name

This prevents future duplicates by using domain-based matching instead of name-based.

Stage 4: After State

Re-run the Before State analysis and compare duplicate counts.

"""
After State: Verify duplicate reduction.
"""
# Re-fetch all companies and re-run duplicate analysis
# Compare:
#   - Number of duplicate domain groups (should decrease)
#   - Number of duplicate name groups (should decrease)
#   - Top offenders (should be resolved)

# Also verify merged records:
# For each known duplicate that was merged, search for the company
# and confirm only one record exists with all expected associations.

Manual verification:

  1. Search for top offenders by name (should show only 1 record each)
  2. Open merged records and verify contacts and deals from both originals appear
  3. Check Settings > Data Management > Duplicates -- count should be significantly lower

Safety Mechanisms

MechanismDetail
CSV audit trailComplete export of all companies with duplicate group annotations before any merging.
Prioritized approachCustomer and Opportunity companies merged first to protect highest-value data.
Review before mergeCSVs enable team review before any irreversible merges happen.
Confirmation promptPresent duplicate analysis to the user and wait for explicit confirmation before instructing merges.
No auto-mergeThis skill never merges automatically. All merges require manual human decision.

Technical Gotchas

  1. HubSpot has no bulk merge API. There is no programmatic way to merge companies. All merges happen through the UI or third-party tools.

  2. Merging is irreversible. Once merged, records cannot be split apart. When in doubt, skip a pair and revisit later.

  3. Property conflicts: When both records have a value for the same property, HubSpot keeps the value from the "primary" record. Review important properties (phone, address, industry) before confirming.

  4. Companies endpoint uses GET, not POST/search. To list all companies, use GET /crm/v3/objects/companies with pagination, not the Search API. The Search API works too but is slower for full exports.

  5. Domain normalization: Always lowercase and strip whitespace from domains before grouping. Example.com and example.com are the same company.

  6. Name-based duplicates have higher false-positive rates. "State University" might match multiple genuinely different institutions. Domain-based duplicates are more reliable.

  7. Contact reassociation: After merging, verify that contacts from both original records appear under the surviving record. HubSpot should handle this automatically, but spot-check.

  8. The Duplicates tool is plan-tier dependent. Not all HubSpot plans include it. Check availability before instructing the user to navigate there.

Package Setup

uv init hubspot-cleanup
cd hubspot-cleanup
uv add requests python-dotenv

Create a .env file:

HUBSPOT_ACCESS_TOKEN=pat-na1-xxxxxxxx

GitHub リポジトリ

TomGranot/hubspot-admin-skills
パス: skills/merge-duplicate-companies
0

関連スキル

executing-plans

デザイン

executing-plansスキルは、完全な実装計画があり、それを管理されたバッチでレビューチェックポイントを設けながら実行する場合に使用します。このスキルは計画を読み込んで批判的にレビューした後、小さなバッチ(デフォルトは3タスク)でタスクを実行し、各バッチの間に進捗状況を報告してアーキテクトのレビューを受けます。これにより、品質管理チェックポイントが組み込まれた体系的な実装が保証されます。

スキルを見る

requesting-code-review

デザイン

このスキルは、コードレビュアーサブエージェントを起動し、処理を進める前に要件に対してコード変更を分析します。タスク完了後、主要な機能の実装後、またはmainブランチへのマージ前などに使用すべきです。このレビューは、現在の実装と元の計画を比較することで、問題を早期に発見するのに役立ちます。

スキルを見る

connect-mcp-server

デザイン

このスキルは、開発者がHTTP、stdio、またはSSEトランスポートを使用してMCPサーバーをClaude Codeに接続するための包括的なガイドを提供します。GitHub、Notion、カスタムAPIなどの外部サービスを統合するためのインストール、設定、認証、セキュリティについて解説しています。MCP統合のセットアップ、外部ツールの設定、またはClaudeのModel Context Protocolを扱う際にご利用ください。

スキルを見る

web-cli-teleport

デザイン

このスキルは、タスク分析に基づいて開発者がClaude Code WebとCLIインターフェースの選択を支援し、これらの環境間でのシームレスなセッションテレポーテーションを可能にします。Web、CLI、モバイル環境を切り替える際のセッション状態とコンテキストを管理することで、ワークフローを最適化します。様々な段階で異なるツールを必要とする複雑なプロジェクトにご活用ください。

スキルを見る