返回技能列表

qdrant-search-quality-diagnosis

qdrant
更新于 6 days ago
158
18
158
在 GitHub 上查看
designdata

关于

This skill diagnoses search quality issues in Qdrant vector databases, helping developers troubleshoot problems like low recall, irrelevant results, or performance degradation after quantization. It provides methodologies for establishing baselines using exact KNN, comparing approximate HNSW search, and measuring recall@k. Use it when search results degrade unexpectedly or when you need to build a ground truth dataset for quality assessment.

快速安装

Claude Code

推荐
主要方式
npx skills add qdrant/skills -a claude-code
插件命令备选方式
/plugin add https://github.com/qdrant/skills
Git 克隆备选方式
git clone https://github.com/qdrant/skills.git ~/.claude/skills/qdrant-search-quality-diagnosis

在 Claude Code 中复制并粘贴此命令以安装该技能

技能文档

How to Diagnose Bad Search Quality

Before tuning, establish baselines. Use exact KNN as ground truth, compare against approximate HNSW. Target >95% recall@K for production.

Don't Know What's Wrong Yet

Use when: results are irrelevant or missing expected matches and you need to isolate the cause.

  • For a no-code quick check, use the Web UI's ANN Recall tab to compare approximate vs exact recall@k Web UI ANN Recall
  • For the same comparison in code (CI gating, regression tests), run each query twice — once approximate, once with exact=true — and compute recall@k from the overlap ANN recall in CI
  • Exact search bad = model or search pipeline problem. Exact good, approximate bad = tune HNSW.
  • Check if quantization degrades quality (compare with and without)
  • Check if filters are too restrictive (then you might need to use ACORN)
  • If duplicate results from chunked documents, use Grouping API to deduplicate Grouping

Payload filtering and sparse vector search are different things. Metadata (dates, categories, tags) goes in payload for filtering. Text content goes in sparse vectors for search.

Approximate Search Worse Than Exact

Use when: exact search returns good results but HNSW approximation misses them.

Binary quantization requires rescore. Without it, quality loss is severe. Use oversampling (3-5x minimum for binary) to recover recall. Always test quantization impact on your data before production. Quantization

Wrong Embedding Model

Use when: exact search also returns bad results.

Check Qdrant team recommendations on how to choose an embedding model.

Test top 3 MTEB models on 100-1000 sample queries Hosted Qdrant inference. Score them against a labeled set to compare apples to apples Measuring Retrieval Relevance.

Unoptimized Search Pipeline

Use when: exact search also returns bad results and model choice is confirmed by user.

Optimize search according to advanced search-strategies skill.

Need a Labeled Baseline to Score Recall, MRR, or NDCG

Use when: user has no golden set, asks "how do I know if my search is good?", or needs to gate releases on a retrieval metric.

  • Build a labeled query set — human, log-based, or LLM-synthetic — and score retrieval with ranx Measuring Retrieval Relevance
  • Pick the metric by usage: Recall@k for RAG, MRR/Hits@1 for single-answer, NDCG@k for re-ranking Choosing the metric
  • For full RAG pipelines, also score generation with Ragas and use the retrieval-vs-generation 2x2 to isolate regressions Pipeline Output Quality
  • Gate CI on a per-metric threshold to catch regressions from embedding-model swaps, prompt changes, or index config changes

What NOT to Do

  • Tune Qdrant before verifying the model is right for the task (most quality issues are model issues)
  • Use binary quantization without rescore (severe quality loss)
  • Set hnsw_ef lower than results requested (guaranteed bad recall)
  • Skip payload indexes on filtered fields then blame quality (HNSW can't traverse filtered-out nodes, and filterable HNSW is built only if payload indexes were set up prior)
  • Deploy without baseline recall or other search relevance metrics (no way to measure regressions)
  • Confuse payload filtering with sparse vector search (different things, different config)

GitHub 仓库

qdrant/skills
路径: skills/qdrant-search-quality/diagnosis
0
agent-skillsai-agentsclaude-codecodexcursorembeddings

相关推荐技能

content-collections

Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。

查看技能

polymarket

这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。

查看技能

creating-opencode-plugins

该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。

查看技能

sglang

SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。

查看技能