SKILL·D2AC9F

pufferlib

Name: pufferlib
Author: K-Dense-AI

K-Dense-AI

업데이트됨 1 month ago

31,025

3,113

31,025

GitHub에서 보기

디자인wordaidesign

정보

PufferLib은 속도와 확장성에 최적화된 고성능 강화 학습 프레임워크로, 표준 구현 대비 2~10배의 성능 향상을 제공합니다. 빠른 병렬 학습, 벡터화된 환경 또는 다중 에이전트 시스템이 필요한 경우, 특히 Atari 및 NetHack과 같은 게임 환경에서 사용하세요. 빠른 프로토타이핑이나 상세한 문서화가 갖춰진 표준 알고리즘이 필요한 경우에는 stable-baselines3를 고려해 보세요.

빠른 설치

Claude Code

문서

PufferLib - High-Performance Reinforcement Learning

Overview

PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.

When to Use This Skill

Use this skill when:

Training RL agents with PPO on any environment (single or multi-agent)
Creating custom environments using the PufferEnv API
Optimizing performance for parallel environment simulation (vectorization)
Integrating existing environments from Gymnasium, PettingZoo, Atari, Procgen, etc.
Developing policies with CNN, LSTM, or custom architectures
Scaling RL to millions of steps per second for faster experimentation
Multi-agent RL with native multi-agent environment support

Core Capabilities

1. High-Performance Training (PuffeRL)

PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.

Quick start training:

# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4

# Distributed training
torchrun --nproc_per_node=4 train.py

Python training loop:

import pufferlib
from pufferlib import PuffeRL

# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Create trainer
trainer = PuffeRL(
    env=env,
    policy=my_policy,
    device='cuda',
    learning_rate=3e-4,
    batch_size=32768
)

# Training loop
for iteration in range(num_iterations):
    trainer.evaluate()  # Collect rollouts
    trainer.train()     # Train on batch
    trainer.mean_and_log()  # Log results

For comprehensive training guidance, read references/training.md for:

Complete training workflow and CLI options
Hyperparameter tuning with Protein
Distributed multi-GPU/multi-node training
Logger integration (Weights & Biases, Neptune)
Checkpointing and resume training
Performance optimization tips
Curriculum learning patterns

2. Environment Development (PufferEnv)

Create custom high-performance environments with the PufferEnv API.

Basic environment structure:

import numpy as np
from pufferlib import PufferEnv

class MyEnvironment(PufferEnv):
    def __init__(self, buf=None):
        super().__init__(buf)

        # Define spaces
        self.observation_space = self.make_space((4,))
        self.action_space = self.make_discrete(4)

        self.reset()

    def reset(self):
        # Reset state and return initial observation
        return np.zeros(4, dtype=np.float32)

    def step(self, action):
        # Execute action, compute reward, check done
        obs = self._get_observation()
        reward = self._compute_reward()
        done = self._is_done()
        info = {}

        return obs, reward, done, info

Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:

Different observation space types (vector, image, dict)
Action space variations (discrete, continuous, multi-discrete)
Multi-agent environment structure
Testing utilities

For complete environment development, read references/environments.md for:

PufferEnv API details and in-place operation patterns
Observation and action space definitions
Multi-agent environment creation
Ocean suite (20+ pre-built environments)
Performance optimization (Python to C workflow)
Environment wrappers and best practices
Debugging and validation techniques

3. Vectorization and Performance

Achieve maximum throughput with optimized parallel simulation.

Vectorization setup:

import pufferlib

# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)

# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS

Key optimizations:

Shared memory buffers for zero-copy observation passing
Busy-wait flags instead of pipes/queues
Surplus environments for async returns
Multiple environments per worker

For vectorization optimization, read references/vectorization.md for:

Architecture and performance characteristics
Worker and batch size configuration
Serial vs multiprocessing vs async modes
Shared memory and zero-copy patterns
Hierarchical vectorization for large scale
Multi-agent vectorization strategies
Performance profiling and troubleshooting

4. Policy Development

Build policies as standard PyTorch modules with optional utilities.

Basic policy structure:

import torch.nn as nn
from pufferlib.pytorch import layer_init

class Policy(nn.Module):
    def __init__(self, observation_space, action_space):
        super().__init__()

        # Encoder
        self.encoder = nn.Sequential(
            layer_init(nn.Linear(obs_dim, 256)),
            nn.ReLU(),
            layer_init(nn.Linear(256, 256)),
            nn.ReLU()
        )

        # Actor and critic heads
        self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
        self.critic = layer_init(nn.Linear(256, 1), std=1.0)

    def forward(self, observations):
        features = self.encoder(observations)
        return self.actor(features), self.critic(features)

For complete policy development, read references/policies.md for:

CNN policies for image observations
Recurrent policies with optimized LSTM (3x faster inference)
Multi-input policies for complex observations
Continuous action policies
Multi-agent policies (shared vs independent parameters)
Advanced architectures (attention, residual)
Observation normalization and gradient clipping
Policy debugging and testing

5. Environment Integration

Seamlessly integrate environments from popular RL frameworks.

Gymnasium integration:

import gymnasium as gym
import pufferlib

# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)

# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)

PettingZoo multi-agent:

# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)

Supported frameworks:

Gymnasium / OpenAI Gym
PettingZoo (parallel and AEC)
Atari (ALE)
Procgen
NetHack / MiniHack
Minigrid
Neural MMO
Crafter
GPUDrive
MicroRTS
Griddly
And more...

For integration details, read references/integration.md for:

Complete integration examples for each framework
Custom wrappers (observation, reward, frame stacking, action repeat)
Space flattening and unflattening
Environment registration
Compatibility patterns
Performance considerations
Integration debugging

Quick Start Workflow

For Training Existing Environments

Choose environment from Ocean suite or compatible framework
Use scripts/train_template.py as starting point
Configure hyperparameters for your task
Run training with CLI or Python script
Monitor with Weights & Biases or Neptune
Refer to references/training.md for optimization

For Creating Custom Environments

Start with scripts/env_template.py
Define observation and action spaces
Implement reset() and step() methods
Test environment locally
Vectorize with pufferlib.emulate() or make()
Refer to references/environments.md for advanced patterns
Optimize with references/vectorization.md if needed

For Policy Development

Choose architecture based on observations:
- Vector observations → MLP policy
- Image observations → CNN policy
- Sequential tasks → LSTM policy
- Complex observations → Multi-input policy
Use layer_init for proper weight initialization
Follow patterns in references/policies.md
Test with environment before full training

For Performance Optimization

Profile current throughput (steps per second)
Check vectorization configuration (num_envs, num_workers)
Optimize environment code (in-place ops, numpy vectorization)
Consider C implementation for critical paths
Use references/vectorization.md for systematic optimization

Resources

scripts/

train_template.py - Complete training script template with:

Environment creation and configuration
Policy initialization
Logger integration (WandB, Neptune)
Training loop with checkpointing
Command-line argument parsing
Multi-GPU distributed training setup

env_template.py - Environment implementation templates:

Single-agent PufferEnv example (grid world)
Multi-agent PufferEnv example (cooperative navigation)
Multiple observation/action space patterns
Testing utilities

references/

training.md - Comprehensive training guide:

Training workflow and CLI options
Hyperparameter configuration
Distributed training (multi-GPU, multi-node)
Monitoring and logging
Checkpointing
Protein hyperparameter tuning
Performance optimization
Common training patterns
Troubleshooting

environments.md - Environment development guide:

PufferEnv API and characteristics
Observation and action spaces
Multi-agent environments
Ocean suite environments
Custom environment development workflow
Python to C optimization path
Third-party environment integration
Wrappers and best practices
Debugging

vectorization.md - Vectorization optimization:

Architecture and key optimizations
Vectorization modes (serial, multiprocessing, async)
Worker and batch configuration
Shared memory and zero-copy patterns
Advanced vectorization (hierarchical, custom)
Multi-agent vectorization
Performance monitoring and profiling
Troubleshooting and best practices

policies.md - Policy architecture guide:

Basic policy structure
CNN policies for images
LSTM policies with optimization
Multi-input policies
Continuous action policies
Multi-agent policies
Advanced architectures (attention, residual)
Observation processing and unflattening
Initialization and normalization
Debugging and testing

integration.md - Framework integration guide:

Gymnasium integration
PettingZoo integration (parallel and AEC)
Third-party environments (Procgen, NetHack, Minigrid, etc.)
Custom wrappers (observation, reward, frame stacking, etc.)
Space conversion and unflattening
Environment registration
Compatibility patterns
Performance considerations
Debugging integration

Tips for Success

Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments
Profile early: Measure steps per second from the start to identify bottlenecks
Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points
Read references as needed: Each reference file is self-contained and focused on a specific capability
Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed
Leverage vectorization: PufferLib's vectorization is key to achieving high throughput
Monitor training: Use WandB or Neptune to track experiments and identify issues early
Test environments: Validate environment logic before scaling up training
Check existing environments: Ocean suite provides 20+ pre-built environments
Use proper initialization: Always use layer_init from pufferlib.pytorch for policies

Common Use Cases

Training on Standard Benchmarks

# Atari
env = pufferlib.make('atari-pong', num_envs=256)

# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)

# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)

Multi-Agent Learning

# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)

# Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)

Custom Task Development

# Create custom environment
class MyTask(PufferEnv):
    # ... implement environment ...

# Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)

High-Performance Optimization

# Maximize throughput
env = pufferlib.make(
    'my-env',
    num_envs=1024,      # Large batch
    num_workers=16,     # Many workers
    envs_per_worker=64  # Optimize per worker
)

Installation

uv pip install pufferlib

Documentation

Official docs: https://puffer.ai/docs.html
GitHub: https://github.com/PufferAI/PufferLib
Discord: Community support available

GitHub 저장소

K-Dense-AI/claude-scientific-skills

경로: skills/pufferlib

agent-skillsai-scientistbioinformaticschemoinformaticsclaudeclaude-skills

FAQ

Frequently asked questions

What is the pufferlib skill?

pufferlib is a Claude Skill by K-Dense-AI. Skills package instructions and resources that Claude loads on demand, so Claude can perform pufferlib-related tasks without extra prompting.

How do I install pufferlib?

Use the install commands on this page: add pufferlib to Claude Code as a plugin, or clone its repository into your skills directory, then restart Claude so it picks up the skill.

What category does pufferlib belong to?

pufferlib is in the Design category, tagged word, ai and design.

Is pufferlib free to use?

Yes. pufferlib is listed on AIMCP and free to install. It runs inside Claude, so no separate service account is required to use the skill itself.

연관 스킬

executing-plans

디자인

executing-plans 스킬은 검토 체크포인트가 포함된 통제된 배치로 실행할 완전한 구현 계획이 있을 때 사용합니다. 이 스킬은 계획을 불러와 비판적으로 검토한 후, 소규모 배치(기본값 3개 작업)로 작업을 실행하면서 각 배치 사이에 진행 상황을 아키텍트 검토를 위해 보고합니다. 이를 통해 내재된 품질 관리 체크포인트를 갖춘 체계적인 구현이 보장됩니다.

스킬 보기

requesting-code-review

디자인

이 스킬은 코드 변경 사항을 요구 사항에 따라 분석하기 위해 코드 리뷰어 하위 에이전트를 호출합니다. 작업 완료 후, 주요 기능 구현 후, 또는 메인 브랜치에 병합하기 전에 사용해야 합니다. 이 리뷰는 현재 구현체와 원래 계획을 비교하여 문제를 조기에 발견하는 데 도움이 됩니다.

스킬 보기

connect-mcp-server

디자인

이 스킬은 개발자들이 HTTP, stdio 또는 SSE 전송 방식을 통해 MCP 서버를 Claude Code에 연결하는 포괄적인 가이드를 제공합니다. GitHub, Notion 및 사용자 정의 API와 같은 외부 서비스를 통합하기 위한 설치, 구성, 인증 및 보안을 다룹니다. MCP 통합 설정, 외부 도구 구성 또는 Claude의 모델 컨텍스트 프로토콜 작업 시 활용하세요.

스킬 보기

web-cli-teleport

디자인

이 스킬은 작업 분석을 기반으로 개발자가 Claude Code 웹 인터페이스와 CLI 인터페이스 중 선택할 수 있도록 돕고, 두 환경 간 원활한 세션 텔레포트를 가능하게 합니다. 웹, CLI 또는 모바일 환경 전환 시 세션 상태와 컨텍스트를 관리하여 워크플로를 최적화합니다. 다양한 단계에서 서로 다른 도구가 필요한 복잡한 프로젝트에 사용하세요.

스킬 보기