Back to Skills

huggingface-tokenizers

davila7
Updated 28 days ago
409 views
18,478
1,685
18,478
View on GitHub
DocumentsTokenizationHuggingFaceBPEWordPieceUnigramFast TokenizationRustCustom TokenizerAlignment TrackingProduction

About

This skill provides high-performance tokenization using HuggingFace's Rust-based library, processing 1GB of text in under 20 seconds. It supports BPE, WordPiece, and Unigram algorithms while enabling custom tokenizer training and alignment tracking. Use it when you need production-fast tokenization or to build custom tokenizers integrated with the transformers ecosystem.

Quick Install

Claude Code

Recommended
Primary
npx skills add davila7/claude-code-templates -a claude-code
Plugin CommandAlternative
/plugin add https://github.com/davila7/claude-code-templates
Git CloneAlternative
git clone https://github.com/davila7/claude-code-templates.git ~/.claude/skills/huggingface-tokenizers

Copy and paste this command in Claude Code to install this skill

GitHub Repository

davila7/claude-code-templates
Path: cli-tool/components/skills/ai-research/tokenization-huggingface-tokenizers
0
anthropicanthropic-claudeclaudeclaude-code

Related Skills