index-manager
关于
The index-manager skill handles the complete lifecycle of MPEP search indexes, including downloading PDFs, extracting content, generating embeddings, and building FAISS/BM25 indexes. It provides automated tools for maintenance, optimization, and troubleshooting when rebuilding indexes or addressing corruption issues. Developers should use it for initial index creation, adding new content, or when diagnostic checks indicate index problems.
快速安装
Claude Code
推荐/plugin add https://github.com/RobThePCGuy/Claude-Patent-Creatorgit clone https://github.com/RobThePCGuy/Claude-Patent-Creator.git ~/.claude/skills/index-manager在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Index Manager Skill
Expert system for managing MPEP search index lifecycle: PDF downloads, index building, maintenance, updates, optimization.
FOR CLAUDE: All dependencies installed, system operational.
- Go directly to appropriate phase
- Scripts/tools in mcp_server/
- Use patent-creator CLI when available
- Only run diagnostics if operations fail
When to Use
Building/rebuilding MPEP index, corruption/missing files, optimization, adding content, troubleshooting.
Index Lifecycle
PDFs Not Present -> Download (2-5 min, 500MB)
-> Extract & Parse (500MB data)
-> Generate Embeddings (5-10 min GPU, 35-65 min CPU)
-> Build FAISS + BM25 Indexes
-> Index Ready (mcp_server/index/)
-> Maintenance (Verify -> Optimize -> Update)
Phase 1: PDF Management
Check Status:
ls pdfs/ # Should show mpep-*.pdf, consolidated_laws.pdf, consolidated_rules.pdf
Download PDFs:
patent-creator download-mpep
# Or: python install.py (Select "Download MPEP PDFs")
Verify Integrity:
python -c "
import fitz
from pathlib import Path
for pdf in Path('pdfs').glob('*.pdf'):
try:
doc = fitz.open(pdf)
print(f'[OK] {pdf.name}: {len(doc)} pages')
doc.close()
except Exception as e:
print(f'[X] {pdf.name}: ERROR - {e}')
"
Phase 2: Index Building
patent-creator rebuild-index
# Or: python mcp_server/server.py --rebuild-index
Timeline:
- Load PDFs: 30s
- Extract text: 1-2 min
- Chunk text (500 tokens): 30s
- Generate embeddings: 5-10 min (GPU) or 35-65 min (CPU)
- Build FAISS/BM25: 1 min
- Save to disk: 10s
Total: 5-15 min (GPU) or 35-65 min (CPU)
Custom Build:
from mcp_server.mpep_search import MPEPIndex
index = MPEPIndex(use_hyde=False)
index.build_index(
chunk_size=500,
overlap=50,
batch_size=32 # Reduce to 16/8 if OOM
)
Phase 3: Verification
# Check files
ls -lh mcp_server/index/
# Expected: mpep_index.faiss (~150MB), mpep_metadata.json (~80MB), mpep_bm25.pkl (~60MB)
# Verify health
patent-creator health
# Should show: [OK] MPEP Index: Ready (12,543 chunks)
# Manual test
python -c "
from mcp_server.mpep_search import MPEPIndex
index = MPEPIndex()
print(f'Chunks: {len(index.chunks)}')
results = index.search('claim definiteness', top_k=3)
print(f'Search results: {len(results)}')
"
Phase 4: Maintenance
When to Rebuild:
- MPEP updates (quarterly check uspto.gov)
- Index corruption
- After adding new PDFs
- Performance degradation
- Machine migration
Rebuild Process:
# Backup (optional)
cp -r mcp_server/index mcp_server/index_backup_$(date +%Y%m%d)
# Rebuild
patent-creator rebuild-index
# Verify
patent-creator health
# Remove backup if successful
rm -rf mcp_server/index_backup_*
Phase 5: Content Updates
# Download new PDF
wget https://www.uspto.gov/web/offices/pac/mpep/mpep-2900.pdf -O pdfs/mpep-2900.pdf
# Rebuild (includes new section)
patent-creator rebuild-index
Note: Incremental updates not supported. Full rebuild required.
Troubleshooting
- OOM errors during build
- Build taking too long
- Corrupted index files
- Search returning no results
Performance Tuning
- Embedding generation speed (GPU vs CPU)
- Search latency optimization
- Index size reduction
- Batch size tuning
Quick Reference
| Command | Purpose |
|---|---|
patent-creator download-mpep | Download MPEP PDFs |
patent-creator rebuild-index | Build/rebuild search index |
patent-creator health | Check index health |
ls -lh mcp_server/index/ | View index files |
Best Practices:
- Backup before rebuild
- Verify PDFs before building
- Use GPU for 10x faster builds
- Test after rebuild
- Keep PDFs until verified
- Weekly health checks
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
creating-opencode-plugins
元该Skill为开发者创建OpenCode插件提供指导,涵盖命令、文件、LSP等25+种事件类型。它详细说明了插件结构、事件API规范及JavaScript/TypeScript实现模式,帮助开发者构建事件驱动的模块。适用于需要拦截操作、扩展功能或自定义AI助手行为的插件开发场景。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
evaluating-llms-harness
测试该Skill通过60+个学术基准测试(如MMLU、GSM8K等)评估大语言模型质量,适用于模型对比、学术研究及训练进度追踪。它支持HuggingFace、vLLM和API接口,被EleutherAI等行业领先机构广泛采用。开发者可通过简单命令行快速对模型进行多任务批量评估。
