data-sourcing

majiayu000

更新日 Yesterday

19 閲覧

その他data

について

このスキルは、150以上のプロバイダーからインテリジェントにリクエストを選択・ルーティングすることで、データ品質を最大化しつつクレジットコストを最小化し、開発者のデータエンリッチメント最適化を支援します。プロバイダーウォーターフォールの構築や調整、クレジット使用状況の監査、GTM（Go-to-Market）チームやRevOps（Revenue Operations）チーム向けのエンリッチメントロジックの設計に最適です。このフレームワークは、入力タイプと成功確率に基づいたスマートなルーティングを提供し、最大限のカバレッジを実現するウォーターフォールシーケンスを実装します。

クイックインストール

Claude Code

推奨

プラグインコマンド推奨

/plugin add https://github.com/majiayu000/claude-skill-registry

Git クローン代替

git clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/data-sourcing

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

ドキュメント

Data Sourcing & Provider Optimization Skill

When to Use

Selecting provider stacks for email, phone, company, or intent enrichment
Building or tuning waterfall sequences to improve success rates
Auditing credit consumption or provider performance
Designing enrichment logic for GTM ops, RevOps, or data engineering teams

Framework

You are an expert at selecting and optimizing data providers from 150+ available options to maximize data quality while minimizing credit costs. Use this layered framework to keep enrichment predictable and efficient.

Core Principles

Quality-Cost Balance: Optimize for highest data quality within budget constraints
Smart Routing: Route requests to providers based on input type and success probability
Waterfall Logic: Use sequential provider attempts for maximum success
Caching Strategy: Leverage cached data to reduce redundant API calls
Bulk Optimization: Process similar requests together for volume discounts

Provider Selection Matrix

For Email Discovery

Best Input Scenarios:

Have LinkedIn URL: ContactOut → RocketReach → Apollo
Have Name + Company: Apollo → Hunter → RocketReach → FindyMail
Have Domain Only: Hunter → Apollo → Clearbit
Have Email (need validation): ZeroBounce → NeverBounce → Debounce

Quality Tiers:

Premium (90%+ success): ZoomInfo, BetterContact waterfall
Standard (75%+ success): Apollo, Hunter, RocketReach
Budget (60%+ success): Snov.io, Prospeo, ContactOut

For Company Intelligence

Data Type Priority:

Basic Firmographics: Clearbit (fastest) → Ocean.io → Apollo
Financial Data: Crunchbase → PitchBook → Dealroom
Technology Stack: BuiltWith → HG Insights → Clearbit
Intent Signals: B2D AI → ZoomInfo Intent → 6sense
News & Social: Google News → Social platforms → Owler

Industry Specialization:

Startups: Crunchbase, Dealroom, AngelList
Enterprise: ZoomInfo, D&B, HG Insights
E-commerce: Store Leads, BuiltWith, Shopify data
Healthcare: Definitive Healthcare + compliance providers
Financial Services: PitchBook, S&P Capital IQ

Credit Optimization Strategies

Cost Tiers

Tier 0 (Free): Native operations, cached data, manual inputs
Tier 1 (0.5 credits): Validation, verification, basic lookups
Tier 2 (1-2 credits): Standard enrichments (Apollo, Hunter, Clearbit)
Tier 3 (2-3 credits): Premium data (ZoomInfo, technographics, intent)
Tier 4 (3-5 credits): Enterprise intelligence (PitchBook, custom AI)
Tier 5 (5-10 credits): Specialized services (video generation, deep AI research)

Optimization Tactics

1. Cache Everything

Email: 30-day cache
Company: 90-day cache
Intent: 7-day cache
Static data: Indefinite cache

2. Batch Processing

# Process in batches for volume discounts
if record_count > 1000:
    use_provider("apollo_bulk")  # 10-30% discount
elif record_count > 100:
    use_parallel_processing()
else:
    use_standard_processing()

3. Smart Waterfalls

waterfall_sequence = [
    {"provider": "cache", "credits": 0},
    {"provider": "apollo", "credits": 1.5, "stop_if_success": True},
    {"provider": "hunter", "credits": 1.2, "stop_if_success": True},
    {"provider": "bettercontact", "credits": 3, "stop_if_success": True},
    {"provider": "ai_research", "credits": 5, "last_resort": True}
]

Provider-Specific Optimizations

Apollo.io

Strengths: US B2B, LinkedIn data, phone numbers
Weaknesses: International coverage, personal emails
Tips: Use bulk API for 10%+ discount, batch similar companies

ZoomInfo

Strengths: Enterprise data, org charts, intent signals
Weaknesses: Expensive, SMB coverage
Tips: Reserve for high-value accounts, negotiate enterprise deals

Hunter

Strengths: Domain searches, email patterns, API reliability
Weaknesses: Phone numbers, detailed contact info
Tips: Best for initial domain exploration, use pattern detection

Clearbit

Strengths: Real-time API, company data, speed
Weaknesses: Email discovery rates, phone numbers
Tips: Great for instant enrichment, combine with others for contacts

BuiltWith

Strengths: Technology detection, historical data, e-commerce
Weaknesses: Contact information, company financials
Tips: Filter accounts by technology before enrichment

Waterfall Strategies

Maximum Success Waterfall

Priority: Success rate over cost
Sequence:
  1. BetterContact (aggregates 10+ sources)
  2. ZoomInfo (if enterprise)
  3. Apollo + Hunter + RocketReach
  4. AI web research
Expected Success: 95%+
Average Cost: 8-12 credits

Balanced Waterfall

Priority: Good success with reasonable cost
Sequence:
  1. Apollo.io
  2. Hunter (if domain match)
  3. RocketReach (if name match)
  4. Stop or continue based on confidence
Expected Success: 80%
Average Cost: 3-5 credits

Budget Waterfall

Priority: Minimize cost
Sequence:
  1. Cache check
  2. Hunter (domain only)
  3. Free sources (Google, LinkedIn public)
  4. Stop at first result
Expected Success: 60%
Average Cost: 1-2 credits

Quality Scoring Framework

def calculate_data_quality_score(data, sources):
    score = 0
    
    # Multi-source validation (30 points)
    if len(sources) > 1:
        score += min(len(sources) * 10, 30)
    
    # Data completeness (30 points)
    required_fields = ["email", "phone", "title", "company"]
    score += sum(10 for field in required_fields if data.get(field))
    
    # Verification status (20 points)
    if data.get("email_verified"):
        score += 10
    if data.get("phone_verified"):
        score += 10
    
    # Recency (20 points)
    days_old = get_data_age(data)
    if days_old < 30:
        score += 20
    elif days_old < 90:
        score += 10
    
    return score

Industry-Specific Provider Selection

SaaS/Technology

Primary: Apollo, Clearbit, BuiltWith
Secondary: ZoomInfo, HG Insights
Intent: G2, TrustRadius, 6sense

Financial Services

Primary: PitchBook, ZoomInfo
Compliance: LexisNexis, D&B
News: Bloomberg, Reuters

Healthcare

Primary: Definitive Healthcare
Compliance: NPPES, state boards
Standard: ZoomInfo with healthcare filters

E-commerce

Primary: Store Leads, BuiltWith
Platform-specific: Shopify, Amazon seller data
Standard: Clearbit with e-commerce signals

Troubleshooting Common Issues

Low Email Discovery Rate

Check email patterns with Hunter
Try personal email providers
Use AI research for executives
Consider LinkedIn outreach instead

High Credit Usage

Audit waterfall sequences
Increase cache TTL
Negotiate volume deals
Use native operations first

Poor Data Quality

Add verification steps
Cross-reference multiple sources
Set minimum confidence thresholds
Implement human review for critical data

Advanced Techniques

Hybrid Enrichment

# Combine AI and traditional providers
def hybrid_enrichment(company):
    # Fast, cheap base data
    base = clearbit_lookup(company)
    
    # AI for missing pieces
    if not base.get("description"):
        base["description"] = ai_generate_description(company)
    
    # Premium for high-value
    if is_enterprise_account(base):
        base.update(zoominfo_enrich(company))
    
    return base

Progressive Enrichment

# Enrich in stages based on engagement
def progressive_enrichment(lead):
    # Stage 1: Basic (on import)
    if lead.stage == "new":
        return basic_enrichment(lead)  # 1-2 credits
    
    # Stage 2: Engaged (opened email)
    elif lead.stage == "engaged":
        return standard_enrichment(lead)  # 3-5 credits
    
    # Stage 3: Qualified (booked meeting)
    elif lead.stage == "qualified":
        return comprehensive_enrichment(lead)  # 10+ credits

Templates

Provider Cheat Sheet: See references/provider_cheat_sheet.md for provider selection.
Cost Calculator: See scripts/cost_calculator.py for estimating credit usage.
Integration Code Templates:

// JavaScript/Node.js template
const enrichContact = async (name, company) => {
  // Check cache first
  const cached = await checkCache(name, company);
  if (cached) return cached;
  
  // Try providers in sequence
  const providers = ['apollo', 'hunter', 'rocketreach'];
  
  for (const provider of providers) {
    try {
      const result = await callProvider(provider, {name, company});
      if (result.email) {
        await saveToCache(result);
        return result;
      }
    } catch (error) {
      console.log(`${provider} failed, trying next...`);
    }
  }
  
  // Fallback to AI research
  return await aiResearch(name, company);
};

Tips

Pre-build waterfalls per motion so GTM teams can call a single orchestration command rather than juggling providers.
Instrument cache hit rates; alert RevOps when cache effectiveness drops below target to avoid spike in credits.
Rotate premium providers each quarter to negotiate better volume discounts and diversify coverage gaps.
Pair enrichment with QA hooks (e.g., verification APIs, sampling) before syncing into CRM to prevent bad data cascades.

Progressive disclosure: Load full provider details and code examples only when actively optimizing enrichment workflows

GitHub リポジトリ

majiayu000/claude-skill-registry

パス: skills/data-sourcing