Back to Skills

cloudflare-vectorize

jezweb
Updated Today
26 views
33
4
33
View on GitHub
Metawordaiapidesigndata

About

This skill provides comprehensive knowledge for Cloudflare Vectorize, a globally distributed vector database. Use it when building semantic search, RAG systems, or AI applications that involve creating indexes, inserting/querying vectors, and configuring metadata filtering. It covers integration with Workers AI and OpenAI embeddings while helping resolve common issues like dimension mismatches and filter syntax errors.

Quick Install

Claude Code

Recommended
Plugin CommandRecommended
/plugin add https://github.com/jezweb/claude-skills
Git CloneAlternative
git clone https://github.com/jezweb/claude-skills.git ~/.claude/skills/cloudflare-vectorize

Copy and paste this command in Claude Code to install this skill

Documentation

Cloudflare Vectorize

Complete implementation guide for Cloudflare Vectorize - a globally distributed vector database for building semantic search, RAG (Retrieval Augmented Generation), and AI-powered applications with Cloudflare Workers.

Status: Production Ready ✅ Last Updated: 2025-10-21 Dependencies: cloudflare-worker-base (for Worker setup), cloudflare-workers-ai (for embeddings) Latest Versions: [email protected], @cloudflare/[email protected] Token Savings: ~65% Errors Prevented: 8 Dev Time Saved: ~3 hours

What This Skill Provides

Core Capabilities

  • Index Management: Create, configure, and manage vector indexes
  • Vector Operations: Insert, upsert, query, delete, and list vectors
  • Metadata Filtering: Advanced filtering with 10 metadata indexes per index
  • Semantic Search: Find similar vectors using cosine, euclidean, or dot-product metrics
  • RAG Patterns: Complete retrieval-augmented generation workflows
  • Workers AI Integration: Native embedding generation with @cf/baai/bge-base-en-v1.5
  • OpenAI Integration: Support for text-embedding-3-small/large models
  • Document Processing: Text chunking and batch ingestion pipelines

Templates Included

  1. basic-search.ts - Simple vector search with Workers AI
  2. rag-chat.ts - Full RAG chatbot with context retrieval
  3. document-ingestion.ts - Document chunking and embedding pipeline
  4. metadata-filtering.ts - Advanced filtering examples

Critical Setup Rules

⚠️ MUST DO BEFORE INSERTING VECTORS

# 1. Create the index with FIXED dimensions and metric
npx wrangler vectorize create my-index \
  --dimensions=768 \
  --metric=cosine

# 2. Create metadata indexes IMMEDIATELY (before inserting vectors!)
npx wrangler vectorize create-metadata-index my-index \
  --property-name=category \
  --type=string

npx wrangler vectorize create-metadata-index my-index \
  --property-name=timestamp \
  --type=number

Why: Metadata indexes MUST exist before vectors are inserted. Vectors added before a metadata index was created won't be filterable on that property.

Index Configuration (Cannot Be Changed Later)

# Dimensions MUST match your embedding model output:
# - Workers AI @cf/baai/bge-base-en-v1.5: 768 dimensions
# - OpenAI text-embedding-3-small: 1536 dimensions
# - OpenAI text-embedding-3-large: 3072 dimensions

# Metrics determine similarity calculation:
# - cosine: Best for normalized embeddings (most common)
# - euclidean: Absolute distance between vectors
# - dot-product: For non-normalized vectors

Wrangler Configuration

wrangler.jsonc:

{
  "name": "my-vectorize-worker",
  "main": "src/index.ts",
  "compatibility_date": "2025-10-21",
  "vectorize": [
    {
      "binding": "VECTORIZE_INDEX",
      "index_name": "my-index"
    }
  ],
  "ai": {
    "binding": "AI"
  }
}

TypeScript Types

export interface Env {
  VECTORIZE_INDEX: VectorizeIndex;
  AI: Ai;
}

interface VectorizeVector {
  id: string;
  values: number[] | Float32Array | Float64Array;
  namespace?: string;
  metadata?: Record<string, string | number | boolean | string[]>;
}

interface VectorizeMatches {
  matches: Array<{
    id: string;
    score: number;
    values?: number[];
    metadata?: Record<string, any>;
    namespace?: string;
  }>;
  count: number;
}

Common Operations

1. Insert vs Upsert

// INSERT: Keeps first insertion if ID exists
await env.VECTORIZE_INDEX.insert([
  {
    id: "doc-1",
    values: [0.1, 0.2, 0.3, ...],
    metadata: { title: "First version" }
  }
]);

// UPSERT: Overwrites with latest if ID exists (use this for updates)
await env.VECTORIZE_INDEX.upsert([
  {
    id: "doc-1",
    values: [0.1, 0.2, 0.3, ...],
    metadata: { title: "Updated version" }
  }
]);

2. Query with Filters

// Generate embedding for query
const queryEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: "What is Cloudflare Workers?"
});

// Search with metadata filtering
const results = await env.VECTORIZE_INDEX.query(
  queryEmbedding.data[0],
  {
    topK: 5,
    filter: {
      category: "documentation",
      timestamp: { $gte: 1704067200 }  // After Jan 1, 2024
    },
    returnMetadata: 'all',
    returnValues: false,
    namespace: 'prod'
  }
);

3. Metadata Filter Operators

// Equality (implicit $eq)
{ category: "docs" }

// Explicit operators
{ status: { $ne: "archived" } }

// In array
{ category: { $in: ["docs", "tutorials", "guides"] } }

// Not in array
{ category: { $nin: ["deprecated", "draft"] } }

// Range queries (numbers)
{
  timestamp: {
    $gte: 1704067200,  // >= Jan 1, 2024
    $lt: 1735689600    // < Jan 1, 2025
  }
}

// Range queries (strings) - prefix searching
{
  url: {
    $gte: "/docs/workers",
    $lt: "/docs/workersz"  // Matches all /docs/workers/*
  }
}

// Nested metadata with dot notation
{ "author.id": "user123" }

// Multiple conditions (implicit AND)
{
  category: "docs",
  language: "en",
  "metadata.published": true
}

4. Namespace Filtering

// Insert with namespace (partition key)
await env.VECTORIZE_INDEX.upsert([
  {
    id: "1",
    values: embedding,
    namespace: "customer-123",
    metadata: { type: "support_ticket" }
  }
]);

// Query only within namespace
const results = await env.VECTORIZE_INDEX.query(queryVector, {
  topK: 5,
  namespace: "customer-123"  // Only search this customer's data
});

5. List and Delete Vectors

// List vector IDs (paginated)
const vectors = await env.VECTORIZE_INDEX.listVectors({
  cursor: null,
  limit: 100
});

// Get specific vectors by ID
const retrieved = await env.VECTORIZE_INDEX.getByIds([
  "doc-1", "doc-2", "doc-3"
]);

// Delete vectors
await env.VECTORIZE_INDEX.deleteByIds([
  "doc-1", "doc-2"
]);

Embedding Generation

Workers AI (Recommended - Free)

const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: ["Document 1 content", "Document 2 content"]
});

// embeddings.data is number[][] (array of 768-dim vectors)
const vectors = embeddings.data.map((values, i) => ({
  id: `doc-${i}`,
  values,
  metadata: { source: 'batch-import' }
}));

await env.VECTORIZE_INDEX.upsert(vectors);

OpenAI Embeddings

import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: env.OPENAI_API_KEY });

const response = await openai.embeddings.create({
  model: "text-embedding-3-small",  // 1536 dimensions
  input: "Text to embed"
});

await env.VECTORIZE_INDEX.upsert([{
  id: "doc-1",
  values: response.data[0].embedding,
  metadata: { model: "openai-3-small" }
}]);

Metadata Best Practices

1. Cardinality Considerations

Low Cardinality (Good for $eq filters):

// Few unique values - efficient filtering
metadata: {
  category: "docs",        // ~10 categories
  language: "en",          // ~5 languages
  published: true          // 2 values (boolean)
}

High Cardinality (Avoid in range queries):

// Many unique values - avoid large range scans
metadata: {
  user_id: "uuid-v4...",         // Millions of unique values
  timestamp_ms: 1704067200123    // Use seconds instead
}

2. Metadata Limits

  • Max 10 metadata indexes per Vectorize index
  • Max 10 KiB metadata per vector
  • String indexes: First 64 bytes (UTF-8)
  • Number indexes: Float64 precision
  • Filter size: Max 2048 bytes (compact JSON)

3. Key Restrictions

// ❌ INVALID metadata keys
metadata: {
  "": "value",              // Empty key
  "user.name": "John",      // Contains dot (reserved for nesting)
  "$admin": true,           // Starts with $
  "key\"with\"quotes": 1    // Contains quotes
}

// ✅ VALID metadata keys
metadata: {
  "user_name": "John",
  "isAdmin": true,
  "nested": { "allowed": true }  // Access as "nested.allowed" in filters
}

RAG Pattern (Full Example)

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const { question } = await request.json();

    // 1. Generate embedding for user question
    const questionEmbedding = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
      text: question
    });

    // 2. Search vector database for similar content
    const results = await env.VECTORIZE_INDEX.query(
      questionEmbedding.data[0],
      {
        topK: 3,
        returnMetadata: 'all',
        filter: { type: "documentation" }
      }
    );

    // 3. Build context from retrieved documents
    const context = results.matches
      .map(m => m.metadata.content)
      .join('\n\n---\n\n');

    // 4. Generate answer with LLM using context
    const answer = await env.AI.run('@cf/meta/llama-3-8b-instruct', {
      messages: [
        {
          role: "system",
          content: `Answer based on this context:\n\n${context}`
        },
        {
          role: "user",
          content: question
        }
      ]
    });

    return Response.json({
      answer: answer.response,
      sources: results.matches.map(m => m.metadata.title)
    });
  }
};

Document Chunking Strategy

function chunkText(text: string, maxChunkSize = 500): string[] {
  const sentences = text.match(/[^.!?]+[.!?]+/g) || [text];
  const chunks: string[] = [];
  let currentChunk = '';

  for (const sentence of sentences) {
    if ((currentChunk + sentence).length > maxChunkSize && currentChunk) {
      chunks.push(currentChunk.trim());
      currentChunk = sentence;
    } else {
      currentChunk += sentence;
    }
  }

  if (currentChunk) chunks.push(currentChunk.trim());
  return chunks;
}

// Usage
const chunks = chunkText(longDocument, 500);
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', {
  text: chunks
});

const vectors = embeddings.data.map((values, i) => ({
  id: `doc-${docId}-chunk-${i}`,
  values,
  metadata: {
    doc_id: docId,
    chunk_index: i,
    total_chunks: chunks.length,
    content: chunks[i]
  }
}));

await env.VECTORIZE_INDEX.upsert(vectors);

Common Errors & Solutions

Error 1: Metadata Index Created After Vectors Inserted

Problem: Filtering doesn't work on existing vectors
Solution: Delete and re-insert vectors OR create metadata indexes BEFORE inserting

Error 2: Dimension Mismatch

Problem: "Vector dimensions do not match index configuration"
Solution: Ensure embedding model output matches index dimensions:
  - Workers AI bge-base: 768
  - OpenAI small: 1536
  - OpenAI large: 3072

Error 3: Invalid Metadata Keys

Problem: "Invalid metadata key"
Solution: Keys cannot:
  - Be empty
  - Contain . (dot)
  - Contain " (quote)
  - Start with $ (dollar sign)

Error 4: Filter Too Large

Problem: "Filter exceeds 2048 bytes"
Solution: Simplify filter or split into multiple queries

Error 5: Range Query on High Cardinality

Problem: Slow queries or reduced accuracy
Solution: Use lower cardinality fields for range queries, or use seconds instead of milliseconds for timestamps

Error 6: Insert vs Upsert Confusion

Problem: Updates not reflecting in index
Solution: Use upsert() to overwrite existing vectors, not insert()

Error 7: Missing Bindings

Problem: "VECTORIZE_INDEX is not defined"
Solution: Add [[vectorize]] binding to wrangler.jsonc

Error 8: Namespace vs Metadata Confusion

Problem: Unclear when to use namespace vs metadata filtering
Solution:
  - Namespace: Partition key, applied BEFORE metadata filters
  - Metadata: Flexible key-value filtering within namespace

Wrangler CLI Reference

# Create index (dimensions and metric cannot be changed later!)
npx wrangler vectorize create <name> \
  --dimensions=768 \
  --metric=cosine

# List indexes
npx wrangler vectorize list

# Get index details
npx wrangler vectorize get <name>

# Get index info (vector count, mutations)
npx wrangler vectorize info <name>

# Delete index
npx wrangler vectorize delete <name>

# Create metadata index (BEFORE inserting vectors!)
npx wrangler vectorize create-metadata-index <name> \
  --property-name=category \
  --type=string

# List metadata indexes
npx wrangler vectorize list-metadata-index <name>

# Delete metadata index
npx wrangler vectorize delete-metadata-index <name> \
  --property-name=category

# Insert vectors from file
npx wrangler vectorize insert <name> \
  --file=vectors.ndjson

# Query vectors
npx wrangler vectorize query <name> \
  --vector="[0.1, 0.2, ...]" \
  --top-k=5 \
  --return-metadata=all

# List vector IDs
npx wrangler vectorize list-vectors <name> \
  --count=100

# Get vectors by IDs
npx wrangler vectorize get-vectors <name> \
  --ids="id1,id2,id3"

# Delete vectors by IDs
npx wrangler vectorize delete-vectors <name> \
  --ids="id1,id2,id3"

Performance Tips

  1. Batch Operations: Insert/upsert in batches of 100-1000 vectors
  2. Selective Return: Only use returnValues: true when needed (saves bandwidth)
  3. Metadata Cardinality: Keep indexed metadata fields low cardinality for range queries
  4. Namespace Filtering: Apply namespace filter before metadata filters (processed first)
  5. Query Optimization: Use topK=3-10 for best latency (larger values increase search time)

When to Use This Skill

Use Vectorize when:

  • Building semantic search over documents, products, or content
  • Implementing RAG chatbots with context retrieval
  • Creating recommendation engines based on similarity
  • Building multi-tenant applications (use namespaces)
  • Need global distribution and low latency

Don't use Vectorize for:

  • Traditional relational data (use D1)
  • Key-value lookups (use KV)
  • Large file storage (use R2)
  • Real-time collaborative state (use Durable Objects)

Templates Location

All working code examples are in ./templates/:

  • basic-search.ts - Simple vector search implementation
  • rag-chat.ts - Complete RAG chatbot
  • document-ingestion.ts - Document processing pipeline
  • metadata-filtering.ts - Advanced filtering patterns

Reference Documentation

Detailed guides in ./references/:

  • wrangler-commands.md - Complete CLI reference
  • index-operations.md - Index creation and management
  • vector-operations.md - Insert, query, delete operations
  • metadata-guide.md - Metadata indexes and filtering
  • embedding-models.md - Model configurations

Integration Examples

Complete integration guides in ./references/:

  • integration-workers-ai-bge-base.md - Workers AI integration (@cf/baai/bge-base-en-v1.5)
  • integration-openai-embeddings.md - OpenAI embeddings integration

Official Documentation


Version: 1.0.0 Status: Production Ready ✅ Token Savings: ~65% Errors Prevented: 8 major categories Dev Time Saved: ~2.5 hours per implementation

GitHub Repository

jezweb/claude-skills
Path: skills/cloudflare-vectorize
aiautomationclaude-codeclaude-skillscloudflarereact

Related Skills

sglang

Meta

SGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.

View skill

evaluating-llms-harness

Testing

This Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.

View skill

llamaguard

Other

LlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.

View skill

langchain

Meta

LangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.

View skill