SKILL·2321F0

serving-llms-vllm

Name: serving-llms-vllm
Author: zechenzhangAGI

zechenzhangAGI

更新于 2 months ago

373 次查看

开发aiapi

关于

This Claude Skill serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. It's ideal for deploying production LLM APIs, optimizing inference performance, or serving models with limited GPU memory. The skill supports OpenAI-compatible endpoints, multiple quantization methods, and tensor parallelism.

快速安装

Claude Code

GitHub 仓库

zechenzhangAGI/AI-research-SKILLs

路径: 12-inference-serving/vllm

aiai-researchclaudeclaude-codeclaude-skillscodex

FAQ

Frequently asked questions

What is the serving-llms-vllm skill?

serving-llms-vllm is a Claude Skill by zechenzhangAGI. Skills package instructions and resources that Claude loads on demand, so Claude can perform serving-llms-vllm-related tasks without extra prompting.

How do I install serving-llms-vllm?

Use the install commands on this page: add serving-llms-vllm to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does serving-llms-vllm belong to?

serving-llms-vllm is in the Development category, tagged ai and api.

Is serving-llms-vllm free to use?

Yes. serving-llms-vllm is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.