スキル一覧に戻る

qdrant-hybrid-search-prefetches

qdrant
更新日 5 days ago
154
18
154
GitHubで表示
その他general

について

このスキルは、開発者がQdrantで語彙検索(キーワード/BM25)と意味検索(密ベクトル)を単一クエリ内で組み合わせたハイブリッド検索を実装するのに役立ちます。異なるベクトル表現やクエリで並列検索を実行する`prefetch` APIの使用方法や、ペイロードインデックスの適切な設定方法についてガイドします。スパース/密検索手法を統合する必要がある場合や、複数フィールド検索戦略を最適化する際にご利用ください。

クイックインストール

Claude Code

推奨
メイン
npx skills add qdrant/skills -a claude-code
プラグインコマンド代替
/plugin add https://github.com/qdrant/skills
Git クローン代替
git clone https://github.com/qdrant/skills.git ~/.claude/skills/qdrant-hybrid-search-prefetches

このコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします

ドキュメント

Different Searches in One Query API Request

Each prefetch runs exactly one search per one query.

Understand if user wants to run several parallel searches on:

  1. The same vector representations but different queries or filters.
  2. Different vector representations but the same raw query.

If first, help user to design logic of constructing query or/and filters on application side and then check Combining Searches. Don't forget to create indices on filterable payload fields, immediately after collection creation, prior to building HNSW, so filterable HNSW could be constructed.

If second, use named vectors, which allow to store multiple vector types per point in one collection. Beware that named vectors currently can be configured only at collection creation. To choose vectors, check following recommendations.

Missed Keyword Matches

Use when: pure vector search misses exact term or keyword matches and you need lexical retrieval alongside semantic search.

Most likely you need a sparse vector for exact text search alongside the dense one. Qdrant uses sparse vectors for lexical searches, as payload filtering doesn't provide any ranking score.

Choose a Sparse Vector for Text

  • BM25 statistical representations, built into Qdrant core (computed server-side). Good baseline, works out-of-domain, usually for long texts. Can be used for non-English content, but needs to be configured per language (tokenization, stemming, stopwords, etc) at indexing and retrieval time. More in Text Search Guide
  • BM42 learned sparse, based on BM25, but better for small chunks of text & with meaning understanding. Works only on English. Requires fine-tuning for domain-specific retrieval. Requires FastEmbed (Python/REST only, not available in all SDKs). Not maintained.
  • miniCOIL learned sparse, BM25 with additional understanding of words meaning in context. Works only on English. Requires fine-tuning for domain-specific retrieval. Requires FastEmbed. Usage shown in FastEmbed miniCOIL documentation.
  • SPLADE++ learned sparse with term expansion. Heavier inference and resources usage but better performance due to term expansion. Requires fine-tuning for domain-specific retrieval. Provided in Qdrant Cloud Inference and FastEmbed versions work only on English. To use with FastEmbed, check FastEmbed SPLADE documentation.
  • External learned sparse embeddings, for example BAAI/bge-m3.

What to remember when using sparse vectors for lexical search:

  • tokenization and stemming affect exact matches, especially on custom codes, terms, etc.

What to remember when using Qdrant BM25 and miniCOIL (based on BM25):

  • avg_len in formula is not computed server-side, it is a user responsibility and passed as a parameter
  • BM25 might be not good for small chunks of text, as BM25 algorithm was initially created for search on long documents; consider adjusting document statistics in sparse vectors (TF & IDF, k, b).
  • Qdrant BM25 vectors are configured per language, so consider customizing stop words, stemming & tokenization when users documents mix several languages or carefully configure vectors per point when they are monolingual.

More on Sparse Vectors for Text Search

Need to Combine Multiple Representations of the Same Item

Use when: the same item is embedded in multiple ways (e.g. different models, languages, or modalities) and you want to search across different representations in one request (don't have to be all of them, can be even one).

Use multiple named vector prefetches, each prefetch covers one representation.

  • If you have groups and subgroups of representations (document -> chunk, image -> patch), you could use searching in groups. To not store identical payloads several times, check Lookup in Groups

You can also search directly on multivectors, a matrix of dense vectors, in a prefetch.

However, it comes with several considerations, as multivectors were designed to support late interaction models using max similarity metric, so it's impossible to retrieve the list of individual max similarity scores for each query vector.

Moreover, multivectors are rarely a good pick for prefetch:

There are ways to make multivector retrieval cheaper (MUVERA, pooling), you can see more in "Evaluating Tradeoffs of Multi-stage Multi-vector Search"

What NOT to Do

  • Choose any search method (for example, BM25) without evaluation of its quality & resources used.
  • Use any search method (for example, BM25) without paying attention to the specifics of their configuration and applicability to the use case.

GitHub リポジトリ

qdrant/skills
パス: skills/qdrant-search-quality/search-strategies/hybrid-search/search-types
0
agent-skillsai-agentsclaude-codecodexcursorembeddings

関連スキル

llamaguard

その他

LlamaGuardは、暴力やヘイトスピーチなど6つの安全性カテゴリーにおいて、LLMの入力と出力をモデレートするMetaの70-80億パラメータモデルです。94〜95%の精度を提供し、vLLM、Hugging Face、Amazon SageMakerを使用してデプロイ可能です。このスキルを使用して、AIアプリケーションにコンテンツフィルタリングと安全策を簡単に統合できます。

スキルを見る

cost-optimization

その他

このClaudeスキルは、リソースの適正サイジング、タグ付け戦略、支出分析を通じて、開発者がクラウドコストを最適化することを支援します。AWS、Azure、GCPにわたるクラウド支出の削減とコストガバナンスの実施のためのフレームワークを提供します。インフラコストの分析、リソースの適正サイジング、または予算制約への対応が必要な際にご利用ください。

スキルを見る

quantizing-models-bitsandbytes

その他

このスキルは、bitsandbytesを使用してLLMを8ビットまたは4ビット精度に量子化し、精度の低下を最小限に抑えつつ50〜75%のメモリ削減を実現します。限られたGPUメモリでより大規模なモデルを実行したり、推論を高速化するのに理想的で、INT8、NF4、FP4などのフォーマットをサポートしています。HuggingFace Transformersと統合され、QLoRAトレーニングや8ビットオプティマイザーを可能にします。

スキルを見る

dispatching-parallel-agents

その他

このClaudeスキルは、複数のエージェントを配備し、3つ以上の独立した問題を並行して調査・修正します。共有状態や依存関係がなく解決可能な、無関係な障害が発生するシナリオ向けに設計されています。中核となる機能は並列問題解決であり、効率を最大化するために独立した問題領域ごとに1つのエージェントを割り当てます。

スキルを見る