context-engineering
About
This skill helps developers monitor and optimize Claude's context window usage to prevent failures and reduce costs. It provides tools for checking usage limits, debugging issues, and implementing efficient memory or agent architectures. Use it when building LLM pipelines where context constraints impact performance or latency.
Quick Install
Claude Code
Recommended/plugin add https://github.com/majiayu000/claude-skill-registrygit clone https://github.com/majiayu000/claude-skill-registry.git ~/.claude/skills/context-engineeringCopy and paste this command in Claude Code to install this skill
Documentation
Context Engineering
Context engineering curates the smallest high-signal token set for LLM tasks. The goal: maximize reasoning quality while minimizing token usage.
When to Activate
- Designing/debugging agent systems
- Context limits constrain performance
- Optimizing cost/latency
- Building multi-agent coordination
- Implementing memory systems
- Evaluating agent performance
- Developing LLM-powered pipelines
Core Principles
- Context quality > quantity - High-signal tokens beat exhaustive content
- Attention is finite - U-shaped curve favors beginning/end positions
- Progressive disclosure - Load information just-in-time
- Isolation prevents degradation - Partition work across sub-agents
- Measure before optimizing - Know your baseline
IMPORTANT:
- Sacrifice grammar for the sake of concision.
- Ensure token efficiency while maintaining high quality.
- Pass these rules to subagents.
Quick Reference
| Topic | When to Use | Reference |
|---|---|---|
| Fundamentals | Understanding context anatomy, attention mechanics | context-fundamentals.md |
| Degradation | Debugging failures, lost-in-middle, poisoning | context-degradation.md |
| Optimization | Compaction, masking, caching, partitioning | context-optimization.md |
| Compression | Long sessions, summarization strategies | context-compression.md |
| Memory | Cross-session persistence, knowledge graphs | memory-systems.md |
| Multi-Agent | Coordination patterns, context isolation | multi-agent-patterns.md |
| Evaluation | Testing agents, LLM-as-Judge, metrics | evaluation.md |
| Tool Design | Tool consolidation, description engineering | tool-design.md |
| Pipelines | Project development, batch processing | project-development.md |
| Runtime Awareness | Usage limits, context window monitoring | runtime-awareness.md |
Key Metrics
- Token utilization: Warning at 70%, trigger optimization at 80%
- Token variance: Explains 80% of agent performance variance
- Multi-agent cost: ~15x single agent baseline
- Compaction target: 50-70% reduction, <5% quality loss
- Cache hit target: 70%+ for stable workloads
Four-Bucket Strategy
- Write: Save context externally (scratchpads, files)
- Select: Pull only relevant context (retrieval, filtering)
- Compress: Reduce tokens while preserving info (summarization)
- Isolate: Split across sub-agents (partitioning)
Anti-Patterns
- Exhaustive context over curated context
- Critical info in middle positions
- No compaction triggers before limits
- Single agent for parallelizable tasks
- Tools without clear descriptions
Guidelines
- Place critical info at beginning/end of context
- Implement compaction at 70-80% utilization
- Use sub-agents for context isolation, not role-play
- Design tools with 4-question framework (what, when, inputs, returns)
- Optimize for tokens-per-task, not tokens-per-request
- Validate with probe-based evaluation
- Monitor KV-cache hit rates in production
- Start minimal, add complexity only when proven necessary
Runtime Awareness
The system automatically injects usage awareness via PostToolUse hook:
<usage-awareness>
Claude Usage Limits: 5h=45%, 7d=32%
Context Window Usage: 67%
</usage-awareness>
Thresholds:
- 70%: WARNING - consider optimization/compaction
- 90%: CRITICAL - immediate action needed
Data Sources:
- Usage limits: Anthropic OAuth API (
https://api.anthropic.com/api/oauth/usage) - Context window: Statusline temp file (
/tmp/ck-context-{session_id}.json)
Scripts
- context_analyzer.py - Context health analysis, degradation detection
- compression_evaluator.py - Compression quality evaluation
GitHub Repository
Related Skills
sglang
MetaSGLang is a high-performance LLM serving framework that specializes in fast, structured generation for JSON, regex, and agentic workflows using its RadixAttention prefix caching. It delivers significantly faster inference, especially for tasks with repeated prefixes, making it ideal for complex, structured outputs and multi-turn conversations. Choose SGLang over alternatives like vLLM when you need constrained decoding or are building applications with extensive prefix sharing.
evaluating-llms-harness
TestingThis Claude Skill runs the lm-evaluation-harness to benchmark LLMs across 60+ standardized academic tasks like MMLU and GSM8K. It's designed for developers to compare model quality, track training progress, or report academic results. The tool supports various backends including HuggingFace and vLLM models.
langchain
MetaLangChain is a framework for building LLM applications using agents, chains, and RAG pipelines. It supports multiple LLM providers, offers 500+ integrations, and includes features like tool calling and memory management. Use it for rapid prototyping and deploying production systems like chatbots, autonomous agents, and question-answering services.
llamaguard
OtherLlamaGuard is Meta's 7-8B parameter model for moderating LLM inputs and outputs across six safety categories like violence and hate speech. It offers 94-95% accuracy and can be deployed using vLLM, Hugging Face, or Amazon SageMaker. Use this skill to easily integrate content filtering and safety guardrails into your AI applications.
