simulate-cpu-architecture
について
このスキルは、開発者が最小限のCPUを一から設計・シミュレーションすることを可能にし、命令セットの定義、データパスと制御ユニットの構築を含みます。完全なフェッチ・デコード・実行サイクルの実装をガイドし、小さなプログラムをクロックサイクルごとにトレースすることでプロセッサを検証します。組み合わせ回路と順序回路のブロックを組み合わせて機能する「コンピュータ内のコンピュータ」を構築する総合演習です。
クイックインストール
Claude Code
推奨npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/simulate-cpu-architectureこのコマンドをClaude Codeにコピー&ペーストしてスキルをインストールします
ドキュメント
Simulate CPU Architecture
Design min but complete CPU: ISA → ALU+regfile → datapath → control unit → fetch-decode-exec cycle → simulate small prog → verify each cycle vs expected reg+mem.
Use When
- Learn/teach computer arch from first principles
- Design custom CPU → FPGA | educational sim
- Verify understanding of inst exec at gate + RTL level
- Build sw sim (Python, JS, walkthrough) of CPU
- Compose combinational (design-logic-circuit) + sequential (build-sequential-circuit) blocks → working system
In
- Required: Complexity target — 4/8/16-bit data; reg count (2-16)
- Required: Min ISA — load, store, add, sub, AND/OR, branch, halt
- Optional: Addressing modes beyond direct (immediate, reg-indirect, indexed)
- Optional: Extra instr (mul, shift, cmp, jump-and-link)
- Optional: Mem size, word size
- Optional: Pipeline stages (single, multi, pipelined) — default multi
- Optional: Medium — sw sim (Py/JS), HDL (Verilog/VHDL), paper
Do
Step 1: Define ISA
Spec everything programmer needs for machine code.
- Data width: bit width data (ALU operand) + addr. Common: 8/8 (256B), 16/16.
- Reg file: count GP + special-purpose.
- GP: R0-R(N-1). R0 hardwired zero? (simplifies encoding)
- Special: PC, IR, Status/Flags (Z, C, N, V).
- Inst format: fixed-width word. Bit fields:
- Opcode: K bits → 2^K instr
- Reg fields: src + dst. N regs → ceil(log2(N)) bits each
- Imm/offset: constants | branch offsets. Remaining bits.
- Inst catalog: each w/ mnemonic, opcode, operand fields, RTL op, flags affected.
- Addressing:
- Reg: in reg
- Immediate: embedded in inst
- Direct: addr in inst
- Reg-indirect: addr in reg
## ISA Specification
- **Data width**: [N] bits
- **Address width**: [M] bits
- **Registers**: [count] general-purpose + [list of special-purpose]
- **Instruction width**: [W] bits
### Instruction Format
| Field | Bits | Width |
|----------|-----------|-------|
| Opcode | [W-1:X] | [n] |
| Rd | [X-1:Y] | [n] |
| Rs | [Y-1:Z] | [n] |
| Imm | [Z-1:0] | [n] |
### Instruction Catalog
| Mnemonic | Opcode | Format | RTL Operation | Flags |
|----------|--------|-----------|------------------------|-------|
| LOAD | 0000 | Rd, [addr]| Rd <- MEM[addr] | - |
| STORE | 0001 | Rs, [addr]| MEM[addr] <- Rs | - |
| ADD | 0010 | Rd, Rs | Rd <- Rd + Rs | Z,C,N |
| ... | ... | ... | ... | ... |
| HALT | 1111 | - | Stop execution | - |
Got: Complete ISA. Each instr = unique opcode, well-defined operand fields, unambiguous RTL, doc'd flag effects. Decodable w/o ambiguity.
If err: Inst word too narrow → widen | reduce regs | var-length (more complex decode) | split into sub-ops. Opcode collisions → reassign.
Step 2: Design Datapath
Build RTL hardware moving + transforming data.
- ALU (via design-logic-circuit). Two N-bit operands + op select → N-bit result + flags (Z, C, N, V).
- Ops: ADD, SUB (2's comp), AND, OR, XOR, NOT, SHL, SHR, PASS-THROUGH (moves, loads).
- Select width fits all ops.
- Reg file (via build-sequential-circuit). Bank w/:
- 2 read ports (combinational, addr in)
- 1 write port (clocked, RegWrite enable)
- R0 hardwired zero → override writes
- PC: N-bit reg w/:
- Increment logic (PC + width/8 → next sequential)
- Load input → branch/jump targets
- Mux select increment | branch (PCsrc)
- Mem interface: separate | unified inst+data.
- Harvard: separate inst (RO) + data (RW). Simpler, simultaneous fetch + data.
- Von Neumann: shared. Sequence fetch + data in diff cycles.
- Interconnect: muxes + buses:
- ALU A mux: regA | PC (PC-relative)
- ALU B mux: regB | sign-extended imm
- RegWrite mux: ALU result | mem read (loads)
- Mem addr mux: PC (fetch) | ALU result (load/store)
## Datapath Components
| Component | Width | Ports / Signals |
|----------------|--------|------------------------------------|
| ALU | [N]-bit| OpA, OpB, ALUop -> Result, Flags |
| Register File | [N]-bit| RdAddrA, RdAddrB, WrAddr, WrData, RegWrite -> RdDataA, RdDataB |
| PC | [M]-bit| PCnext, PCwrite -> PCout |
| Instruction Mem| [W]-bit| Addr -> Instruction |
| Data Memory | [N]-bit| Addr, WrData, MemRead, MemWrite -> RdData |
## Datapath Multiplexers
| Mux Name | Inputs | Select Signal | Output |
|-------------|----------------------|---------------|-------------|
| ALU_B_mux | RegDataB, Immediate | ALUsrc | ALU OpB |
| WrData_mux | ALU Result, MemData | MemToReg | Reg WrData |
| PC_mux | PC+1, BranchTarget | PCsrc | PC next |
Got: Complete datapath (component + mux table). Every ISA instr has viable path src → dst through ALU, regfile, mem.
If err: Inst can't exec on datapath (e.g. no path mem→reg for LOAD) → add missing mux/path. Walk each instr RTL, trace signal flow.
Step 3: Design Control Unit
Logic orchestrating datapath per instr.
- ID control signals: every mux select, RegWrite, MemRead/Write, ALU op select.
- Single-cycle (simplest): combinational. All signals from opcode in 1 clock.
- Multi-cycle (recommended for learning): FSM (build-sequential-circuit) phases:
- Fetch: read inst from mem at PC; → IR; PC++.
- Decode: read regfile (IR fields); sign-extend imm.
- Execute: ALU op | compute mem addr.
- Mem access (load/store only): read | write data mem.
- Write-back: result → dst reg.
- Control signal table: per instr, per phase, each signal value.
- Hardwired vs microprogrammed:
- Hardwired: gates + flip-flops. Faster, less flexible.
- Microprogrammed: ROM stores signals per state. Each microinst = signals + next-state. Slower, easy to modify.
## Control Signals
| Signal | Width | Function |
|-----------|-------|---------------------------------------|
| ALUop | [k] | Selects ALU operation |
| ALUsrc | 1 | 0=register, 1=immediate for ALU B |
| RegWrite | 1 | Enable register file write |
| MemRead | 1 | Enable data memory read |
| MemWrite | 1 | Enable data memory write |
| MemToReg | 1 | 0=ALU result, 1=memory data to register |
| PCsrc | 1 | 0=PC+1, 1=branch target |
| IRwrite | 1 | Enable instruction register load |
## Multi-Cycle Control FSM
| State | Active Signals | Next State |
|---------|----------------------------------------|-------------------|
| FETCH | MemRead, IRwrite, PC <- PC+1 | DECODE |
| DECODE | Read registers, sign-extend immediate | EXECUTE |
| EXECUTE | ALUop=[per instruction], ALUsrc=[...] | MEM_ACCESS or WB |
| MEM_ACC | MemRead or MemWrite | WRITE_BACK |
| WB | RegWrite, MemToReg=[...] | FETCH |
Got: Control unit (combinational | FSM) generates correct signals per instr per phase. No conflicts (e.g. MemRead+MemWrite simultaneous on same mem).
If err: Signal conflict → phases not separated. Load + store access mem in diff phases | mem supports separate r/w ports. Too many states → check shared phases, merge.
Step 4: Implement Fetch-Decode-Execute Cycle
Connect datapath + control → working CPU.
- Clock: → all flip-flops (PC, IR, regfile, FSM state). All updates same edge.
- Phase sequencing: FSM outs → datapath signals. FSM advances 1 state/clock → Fetch → Decode → Exec → Mem → WB.
- Inst fetch: FETCH → PC drives instMem addr. Fetched → IR. PC += 1 instr width.
- Inst decode: DECODE → opcode field of IR → control unit → instr type. Reg addrs from IR → regfile read ports.
- Exec + beyond: per instr type:
- ALU: Exec (ALU computes), WB (→ reg).
- Load: Exec (addr), Mem (read), WB (→ reg).
- Store: Exec (addr), Mem (write).
- Branch: Exec (cmp | flags), conditional PC update.
- Halt: FSM → terminal state, stops.
- Interrupts/exceptions (optional): save PC + jump to handler. Needs extra states + cause reg.
## Cycle Execution Summary
| Instruction Type | Phases | Cycles |
|-----------------|--------------------------------|--------|
| ALU (reg-reg) | Fetch, Decode, Execute, WB | 4 |
| Load | Fetch, Decode, Execute, Mem, WB| 5 |
| Store | Fetch, Decode, Execute, Mem | 4 |
| Branch (taken) | Fetch, Decode, Execute | 3 |
| Branch (not taken)| Fetch, Decode, Execute | 3 |
| Halt | Fetch, Decode | 2 |
Got: Fully connected CPU. FSM drives datapath through correct sequence. State transitions sync on clock edge.
If err: Hangs (no HALT) | wrong results → likely control signal err in 1 specific phase. Use Step 5 trace → isolate failing cycle. PC not incrementing → check FETCH wiring. Wrong reg written → check addr field extraction from IR.
Step 5: Simulate Small Program + Verify
Exec concrete prog, verify each cycle vs expected.
- Test prog: small enough to trace fully (5-15 instr), complex enough to exercise multiple types. Fibonacci ideal: load-imm, add, branch, halt.
- Init: regs = 0. Prog → instMem at addr 0. PC = 0. FSM = FETCH.
- Cycle trace: per cycle record:
- FSM state + phase
- PC + inst fetched/exec'd
- ALU ins, op, result
- Reg reads + writes
- Mem reads + writes
- Flag values
- All control signal values
- Verify vs hand computation: independently compute expected reg+mem state after each instr (not each cycle — instr = multiple cycles). Compare.
- Edge cases:
- Branch not taken (PC++)
- Branch taken (PC = target)
- Load → immediate use (WB completes before next decode reads?)
- Write to R0 if hardwired (no effect)
- HALT (clean stop)
## Test Program: Fibonacci (first 8 terms)
| Addr | Instruction | Mnemonic | Comment |
|------|---------------|------------------|----------------------|
| 0x00 | [encoding] | LOAD R1, #1 | R1 = 1 (F1) |
| 0x01 | [encoding] | LOAD R2, #1 | R2 = 1 (F2) |
| 0x02 | [encoding] | LOAD R3, #6 | R3 = 6 (loop count) |
| 0x03 | [encoding] | ADD R4, R1, R2 | R4 = R1 + R2 |
| 0x04 | [encoding] | MOV R1, R2 | R1 = R2 |
| 0x05 | [encoding] | MOV R2, R4 | R2 = R4 |
| 0x06 | [encoding] | SUB R3, R3, #1 | R3 = R3 - 1 |
| 0x07 | [encoding] | BNZ 0x03 | Branch if R3 != 0 |
| 0x08 | [encoding] | HALT | Stop |
## Cycle-by-Cycle Trace (excerpt)
| Cycle | Phase | PC | IR | ALU Op | Result | RegWrite | Flags |
|-------|---------|-----|----------|----------|--------|----------|-------|
| 1 | FETCH | 0x00| LOAD R1,1| - | - | No | - |
| 2 | DECODE | 0x01| LOAD R1,1| - | - | No | - |
| 3 | EXECUTE | 0x01| LOAD R1,1| PASS #1 | 1 | No | - |
| 4 | WB | 0x01| LOAD R1,1| - | - | R1 <- 1 | - |
| ... | ... | ... | ... | ... | ... | ... | ... |
## Expected Final State
| Register | Value | Description |
|----------|-------|---------------------|
| R1 | [val] | Second-to-last Fib |
| R2 | [val] | Last computed Fib |
| R3 | 0 | Loop counter done |
| R4 | [val] | Same as R2 |
| PC | 0x09 | One past HALT |
Got: Trace matches expected final state. Every instr → correct reg+mem updates. Prog terminates at HALT w/ correct Fib values.
If err: Compare first divergence expected vs actual. Common: (1) ALU op select wrong for instr type → check control table. (2) Branch offset off-by-one → verify PC-relative from current | next instr. (3) WB writes wrong reg → check reg addr extraction. (4) Flags not updated → trace ALU flag logic for operands causing mismatch.
Check
- ISA has load, store, add, sub, AND, OR, branch, halt min
- Each instr unique opcode + unambiguous encoding
- Datapath valid signal path for every instr RTL
- ALU supports all req ops + correct flag gen
- Regfile sufficient r/w ports for inst format
- Control unit correct signals per instr per phase
- No signal conflicts (simultaneous r/w same mem port)
- Fetch-decode-exec fully connected + clocked
- Test prog runs to completion w/ correct final state
- Cycle trace verified vs hand computation
- Branch taken + not-taken both verified
- HALT stops cleanly
Traps
- Branch offset off-by-one: Branches relative to current PC | PC+1 | inst after. Define convention in ISA + impl consistent. #1 most common CPU design bug.
- WB/decode hazard in multi-cycle: Inst I writes reg in WB while I+1 reads in decode → may get old value. Multi-cycle (one at a time) = fine. Pipelined → forwarding | stalling.
- Forget PC++ in fetch: PC not incremented in FETCH → executes same inst forever. Trivially common wiring err.
- ALU flags latching: Update only on ALU instr, not loads/stores/branches. Unconditional → load between cmp + branch corrupts comparison.
- Unsigned vs signed: Decide at ISA time → 2's comp signed | unsigned. Carry flag = diff semantics.
- Mem alignment: Data + inst widths differ | multi-byte instr → align rules. 16-bit instr in byte-addressable → 2 addrs; PC += 2 not 1.
- Overcomplicate first design: Start simplest (8-bit, 4 regs, 8 instr, single | multi-cycle, no pipeline). Working simple > broken complex.
→
design-logic-circuit— ALU, muxes, decoders, combinationalbuild-sequential-circuit— regfile, PC, control FSM, sequentialevaluate-boolean-expression— simplify control logic for hardwiredderive-theoretical-result— perf analysis (CPI, throughput, Amdahl)
GitHub リポジトリ
関連スキル
content-collections
メタこのスキルは、Content Collections(Markdown/MDXファイルを型安全なデータコレクションに変換するTypeScriptファーストのツール)の本番環境でテストされた設定を提供します。Zodバリデーションによる型安全性を実現し、ブログ、ドキュメントサイト、コンテンツ重視のVite + Reactアプリケーション構築時にご利用ください。Viteプラグインの設定、MDXコンパイルから、デプロイ最適化、スキーマバリデーションまで、すべてを網羅しています。
polymarket
メタこのスキルは、開発者がPolymarket予測市場プラットフォームを活用したアプリケーション構築を可能にします。API統合による取引や市場データの取得に加え、WebSocketを介したリアルタイムデータストリーミングにより、ライブ取引や市場活動を監視できます。取引戦略の実装や、ライブ市場更新を処理するツールの作成にご利用ください。
creating-opencode-plugins
メタこのスキルは、開発者がコマンド、ファイル、LSP操作など25種類以上のイベントタイプにフックするOpenCodeプラグインを作成することを支援します。JavaScript/TypeScriptモジュール向けに、プラグイン構造、イベントAPI仕様、および実装パターンを提供します。カスタムイベント駆動ロジックでOpenCode AIアシスタントのライフサイクルをインターセプト、監視、または拡張する必要がある場合にご利用ください。
sglang
メタSGLangは、高性能なLLMサービングフレームワークであり、RadixAttentionプレフィックスキャッシュを活用したJSON、正規表現、エージェントワークフロー向けの高速で構造化された生成を特長とします。特にプレフィックスが繰り返されるタスクにおいて、大幅に高速な推論を実現し、複雑な構造化出力やマルチターン対話に最適です。制約付きデコードが必要な場合や、広範なプレフィックス共有を伴うアプリケーションを構築する場合は、vLLMなどの代替案ではなくSGLangを選択してください。
