llm-inference

dave1010

更新日 Today

40 閲覧

デザインaidesign

について

このスキルは、OpenAI互換エンドポイントを備えたCloudflare Pages Functionsを通じてLLM推論を実現します。複数のモデルへのアクセスを提供し、gpt-oss-120bのような高性能オプションや様々なタスク向けの専門モデルを含みます。アプリケーションにLLM機能を統合する必要があり、エージェントに要件に基づいて最適なモデルを選択させたい場合にご利用ください。

クイックインストール

Claude Code

推奨

プラグインコマンド推奨

/plugin add https://github.com/dave1010/tools

Git クローン代替

git clone https://github.com/dave1010/tools.git ~/.claude/skills/llm-inference

このコマンドをClaude Codeにコピー＆ペーストしてスキルをインストールします

ドキュメント

LLM Inference

The Cloudflare Pages function functions/cerebras-chat.ts provides OpenAI-compatible LLM inference. See tools/cerebras-llm-inference/index.html for a working example.

Available models

Model	Max context tokens	Requests / minute	Tokens / minute
gpt-oss-120b	65,536	30	64,000
llama-3.3-70b	65,536	30	64,000
llama3.1-8b	8,192	30	60,000
qwen-3-235b-a22b-instruct-2507	65,536	30	64,000
qwen-3-235b-a22b-thinking-2507	65,536	30	60,000
qwen-3-32b	65,536	30	64,000
zai-glm-4.6	64,000	10	150,000

llama3.1-8b is the fastest option.
zai-glm-4.6 is the most powerful option.
gpt-oss-120b remains the best all rounder.

LLMs are not just for chat: they can be used to process any string in any arbitrary way. If making a tool that requires the LLM to respond in a specific way or format then be very clear and explicit in its system prompt; eg what to include/exclude, plain/markdown formatting, length, etc.

GitHub リポジトリ

dave1010/tools

パス: .skills/llm-inference

llm-inference

について

クイックインストール

Claude Code

ドキュメント

LLM Inference

Available models

GitHub リポジトリ

関連スキル

content-collections

creating-opencode-plugins

evaluating-llms-harness

sglang