simulate-cpu-architecture
关于
This skill enables developers to design and simulate a minimal CPU from scratch, including defining its instruction set and building the datapath and control unit. It guides you through implementing the complete fetch-decode-execute cycle and verifying the processor by tracing a small program clock cycle by cycle. It's a capstone exercise for composing combinational and sequential logic blocks into a functional "computer inside a computer."
快速安装
Claude Code
推荐npx skills add pjt222/agent-almanac -a claude-code/plugin add https://github.com/pjt222/agent-almanacgit clone https://github.com/pjt222/agent-almanac.git ~/.claude/skills/simulate-cpu-architecture在 Claude Code 中复制并粘贴此命令以安装该技能
技能文档
Simulate CPU Architecture
Design min but complete CPU: ISA → ALU+regfile → datapath → control unit → fetch-decode-exec cycle → simulate small prog → verify each cycle vs expected reg+mem.
Use When
- Learn/teach computer arch from first principles
- Design custom CPU → FPGA | educational sim
- Verify understanding of inst exec at gate + RTL level
- Build sw sim (Python, JS, walkthrough) of CPU
- Compose combinational (design-logic-circuit) + sequential (build-sequential-circuit) blocks → working system
In
- Required: Complexity target — 4/8/16-bit data; reg count (2-16)
- Required: Min ISA — load, store, add, sub, AND/OR, branch, halt
- Optional: Addressing modes beyond direct (immediate, reg-indirect, indexed)
- Optional: Extra instr (mul, shift, cmp, jump-and-link)
- Optional: Mem size, word size
- Optional: Pipeline stages (single, multi, pipelined) — default multi
- Optional: Medium — sw sim (Py/JS), HDL (Verilog/VHDL), paper
Do
Step 1: Define ISA
Spec everything programmer needs for machine code.
- Data width: bit width data (ALU operand) + addr. Common: 8/8 (256B), 16/16.
- Reg file: count GP + special-purpose.
- GP: R0-R(N-1). R0 hardwired zero? (simplifies encoding)
- Special: PC, IR, Status/Flags (Z, C, N, V).
- Inst format: fixed-width word. Bit fields:
- Opcode: K bits → 2^K instr
- Reg fields: src + dst. N regs → ceil(log2(N)) bits each
- Imm/offset: constants | branch offsets. Remaining bits.
- Inst catalog: each w/ mnemonic, opcode, operand fields, RTL op, flags affected.
- Addressing:
- Reg: in reg
- Immediate: embedded in inst
- Direct: addr in inst
- Reg-indirect: addr in reg
## ISA Specification
- **Data width**: [N] bits
- **Address width**: [M] bits
- **Registers**: [count] general-purpose + [list of special-purpose]
- **Instruction width**: [W] bits
### Instruction Format
| Field | Bits | Width |
|----------|-----------|-------|
| Opcode | [W-1:X] | [n] |
| Rd | [X-1:Y] | [n] |
| Rs | [Y-1:Z] | [n] |
| Imm | [Z-1:0] | [n] |
### Instruction Catalog
| Mnemonic | Opcode | Format | RTL Operation | Flags |
|----------|--------|-----------|------------------------|-------|
| LOAD | 0000 | Rd, [addr]| Rd <- MEM[addr] | - |
| STORE | 0001 | Rs, [addr]| MEM[addr] <- Rs | - |
| ADD | 0010 | Rd, Rs | Rd <- Rd + Rs | Z,C,N |
| ... | ... | ... | ... | ... |
| HALT | 1111 | - | Stop execution | - |
Got: Complete ISA. Each instr = unique opcode, well-defined operand fields, unambiguous RTL, doc'd flag effects. Decodable w/o ambiguity.
If err: Inst word too narrow → widen | reduce regs | var-length (more complex decode) | split into sub-ops. Opcode collisions → reassign.
Step 2: Design Datapath
Build RTL hardware moving + transforming data.
- ALU (via design-logic-circuit). Two N-bit operands + op select → N-bit result + flags (Z, C, N, V).
- Ops: ADD, SUB (2's comp), AND, OR, XOR, NOT, SHL, SHR, PASS-THROUGH (moves, loads).
- Select width fits all ops.
- Reg file (via build-sequential-circuit). Bank w/:
- 2 read ports (combinational, addr in)
- 1 write port (clocked, RegWrite enable)
- R0 hardwired zero → override writes
- PC: N-bit reg w/:
- Increment logic (PC + width/8 → next sequential)
- Load input → branch/jump targets
- Mux select increment | branch (PCsrc)
- Mem interface: separate | unified inst+data.
- Harvard: separate inst (RO) + data (RW). Simpler, simultaneous fetch + data.
- Von Neumann: shared. Sequence fetch + data in diff cycles.
- Interconnect: muxes + buses:
- ALU A mux: regA | PC (PC-relative)
- ALU B mux: regB | sign-extended imm
- RegWrite mux: ALU result | mem read (loads)
- Mem addr mux: PC (fetch) | ALU result (load/store)
## Datapath Components
| Component | Width | Ports / Signals |
|----------------|--------|------------------------------------|
| ALU | [N]-bit| OpA, OpB, ALUop -> Result, Flags |
| Register File | [N]-bit| RdAddrA, RdAddrB, WrAddr, WrData, RegWrite -> RdDataA, RdDataB |
| PC | [M]-bit| PCnext, PCwrite -> PCout |
| Instruction Mem| [W]-bit| Addr -> Instruction |
| Data Memory | [N]-bit| Addr, WrData, MemRead, MemWrite -> RdData |
## Datapath Multiplexers
| Mux Name | Inputs | Select Signal | Output |
|-------------|----------------------|---------------|-------------|
| ALU_B_mux | RegDataB, Immediate | ALUsrc | ALU OpB |
| WrData_mux | ALU Result, MemData | MemToReg | Reg WrData |
| PC_mux | PC+1, BranchTarget | PCsrc | PC next |
Got: Complete datapath (component + mux table). Every ISA instr has viable path src → dst through ALU, regfile, mem.
If err: Inst can't exec on datapath (e.g. no path mem→reg for LOAD) → add missing mux/path. Walk each instr RTL, trace signal flow.
Step 3: Design Control Unit
Logic orchestrating datapath per instr.
- ID control signals: every mux select, RegWrite, MemRead/Write, ALU op select.
- Single-cycle (simplest): combinational. All signals from opcode in 1 clock.
- Multi-cycle (recommended for learning): FSM (build-sequential-circuit) phases:
- Fetch: read inst from mem at PC; → IR; PC++.
- Decode: read regfile (IR fields); sign-extend imm.
- Execute: ALU op | compute mem addr.
- Mem access (load/store only): read | write data mem.
- Write-back: result → dst reg.
- Control signal table: per instr, per phase, each signal value.
- Hardwired vs microprogrammed:
- Hardwired: gates + flip-flops. Faster, less flexible.
- Microprogrammed: ROM stores signals per state. Each microinst = signals + next-state. Slower, easy to modify.
## Control Signals
| Signal | Width | Function |
|-----------|-------|---------------------------------------|
| ALUop | [k] | Selects ALU operation |
| ALUsrc | 1 | 0=register, 1=immediate for ALU B |
| RegWrite | 1 | Enable register file write |
| MemRead | 1 | Enable data memory read |
| MemWrite | 1 | Enable data memory write |
| MemToReg | 1 | 0=ALU result, 1=memory data to register |
| PCsrc | 1 | 0=PC+1, 1=branch target |
| IRwrite | 1 | Enable instruction register load |
## Multi-Cycle Control FSM
| State | Active Signals | Next State |
|---------|----------------------------------------|-------------------|
| FETCH | MemRead, IRwrite, PC <- PC+1 | DECODE |
| DECODE | Read registers, sign-extend immediate | EXECUTE |
| EXECUTE | ALUop=[per instruction], ALUsrc=[...] | MEM_ACCESS or WB |
| MEM_ACC | MemRead or MemWrite | WRITE_BACK |
| WB | RegWrite, MemToReg=[...] | FETCH |
Got: Control unit (combinational | FSM) generates correct signals per instr per phase. No conflicts (e.g. MemRead+MemWrite simultaneous on same mem).
If err: Signal conflict → phases not separated. Load + store access mem in diff phases | mem supports separate r/w ports. Too many states → check shared phases, merge.
Step 4: Implement Fetch-Decode-Execute Cycle
Connect datapath + control → working CPU.
- Clock: → all flip-flops (PC, IR, regfile, FSM state). All updates same edge.
- Phase sequencing: FSM outs → datapath signals. FSM advances 1 state/clock → Fetch → Decode → Exec → Mem → WB.
- Inst fetch: FETCH → PC drives instMem addr. Fetched → IR. PC += 1 instr width.
- Inst decode: DECODE → opcode field of IR → control unit → instr type. Reg addrs from IR → regfile read ports.
- Exec + beyond: per instr type:
- ALU: Exec (ALU computes), WB (→ reg).
- Load: Exec (addr), Mem (read), WB (→ reg).
- Store: Exec (addr), Mem (write).
- Branch: Exec (cmp | flags), conditional PC update.
- Halt: FSM → terminal state, stops.
- Interrupts/exceptions (optional): save PC + jump to handler. Needs extra states + cause reg.
## Cycle Execution Summary
| Instruction Type | Phases | Cycles |
|-----------------|--------------------------------|--------|
| ALU (reg-reg) | Fetch, Decode, Execute, WB | 4 |
| Load | Fetch, Decode, Execute, Mem, WB| 5 |
| Store | Fetch, Decode, Execute, Mem | 4 |
| Branch (taken) | Fetch, Decode, Execute | 3 |
| Branch (not taken)| Fetch, Decode, Execute | 3 |
| Halt | Fetch, Decode | 2 |
Got: Fully connected CPU. FSM drives datapath through correct sequence. State transitions sync on clock edge.
If err: Hangs (no HALT) | wrong results → likely control signal err in 1 specific phase. Use Step 5 trace → isolate failing cycle. PC not incrementing → check FETCH wiring. Wrong reg written → check addr field extraction from IR.
Step 5: Simulate Small Program + Verify
Exec concrete prog, verify each cycle vs expected.
- Test prog: small enough to trace fully (5-15 instr), complex enough to exercise multiple types. Fibonacci ideal: load-imm, add, branch, halt.
- Init: regs = 0. Prog → instMem at addr 0. PC = 0. FSM = FETCH.
- Cycle trace: per cycle record:
- FSM state + phase
- PC + inst fetched/exec'd
- ALU ins, op, result
- Reg reads + writes
- Mem reads + writes
- Flag values
- All control signal values
- Verify vs hand computation: independently compute expected reg+mem state after each instr (not each cycle — instr = multiple cycles). Compare.
- Edge cases:
- Branch not taken (PC++)
- Branch taken (PC = target)
- Load → immediate use (WB completes before next decode reads?)
- Write to R0 if hardwired (no effect)
- HALT (clean stop)
## Test Program: Fibonacci (first 8 terms)
| Addr | Instruction | Mnemonic | Comment |
|------|---------------|------------------|----------------------|
| 0x00 | [encoding] | LOAD R1, #1 | R1 = 1 (F1) |
| 0x01 | [encoding] | LOAD R2, #1 | R2 = 1 (F2) |
| 0x02 | [encoding] | LOAD R3, #6 | R3 = 6 (loop count) |
| 0x03 | [encoding] | ADD R4, R1, R2 | R4 = R1 + R2 |
| 0x04 | [encoding] | MOV R1, R2 | R1 = R2 |
| 0x05 | [encoding] | MOV R2, R4 | R2 = R4 |
| 0x06 | [encoding] | SUB R3, R3, #1 | R3 = R3 - 1 |
| 0x07 | [encoding] | BNZ 0x03 | Branch if R3 != 0 |
| 0x08 | [encoding] | HALT | Stop |
## Cycle-by-Cycle Trace (excerpt)
| Cycle | Phase | PC | IR | ALU Op | Result | RegWrite | Flags |
|-------|---------|-----|----------|----------|--------|----------|-------|
| 1 | FETCH | 0x00| LOAD R1,1| - | - | No | - |
| 2 | DECODE | 0x01| LOAD R1,1| - | - | No | - |
| 3 | EXECUTE | 0x01| LOAD R1,1| PASS #1 | 1 | No | - |
| 4 | WB | 0x01| LOAD R1,1| - | - | R1 <- 1 | - |
| ... | ... | ... | ... | ... | ... | ... | ... |
## Expected Final State
| Register | Value | Description |
|----------|-------|---------------------|
| R1 | [val] | Second-to-last Fib |
| R2 | [val] | Last computed Fib |
| R3 | 0 | Loop counter done |
| R4 | [val] | Same as R2 |
| PC | 0x09 | One past HALT |
Got: Trace matches expected final state. Every instr → correct reg+mem updates. Prog terminates at HALT w/ correct Fib values.
If err: Compare first divergence expected vs actual. Common: (1) ALU op select wrong for instr type → check control table. (2) Branch offset off-by-one → verify PC-relative from current | next instr. (3) WB writes wrong reg → check reg addr extraction. (4) Flags not updated → trace ALU flag logic for operands causing mismatch.
Check
- ISA has load, store, add, sub, AND, OR, branch, halt min
- Each instr unique opcode + unambiguous encoding
- Datapath valid signal path for every instr RTL
- ALU supports all req ops + correct flag gen
- Regfile sufficient r/w ports for inst format
- Control unit correct signals per instr per phase
- No signal conflicts (simultaneous r/w same mem port)
- Fetch-decode-exec fully connected + clocked
- Test prog runs to completion w/ correct final state
- Cycle trace verified vs hand computation
- Branch taken + not-taken both verified
- HALT stops cleanly
Traps
- Branch offset off-by-one: Branches relative to current PC | PC+1 | inst after. Define convention in ISA + impl consistent. #1 most common CPU design bug.
- WB/decode hazard in multi-cycle: Inst I writes reg in WB while I+1 reads in decode → may get old value. Multi-cycle (one at a time) = fine. Pipelined → forwarding | stalling.
- Forget PC++ in fetch: PC not incremented in FETCH → executes same inst forever. Trivially common wiring err.
- ALU flags latching: Update only on ALU instr, not loads/stores/branches. Unconditional → load between cmp + branch corrupts comparison.
- Unsigned vs signed: Decide at ISA time → 2's comp signed | unsigned. Carry flag = diff semantics.
- Mem alignment: Data + inst widths differ | multi-byte instr → align rules. 16-bit instr in byte-addressable → 2 addrs; PC += 2 not 1.
- Overcomplicate first design: Start simplest (8-bit, 4 regs, 8 instr, single | multi-cycle, no pipeline). Working simple > broken complex.
→
design-logic-circuit— ALU, muxes, decoders, combinationalbuild-sequential-circuit— regfile, PC, control FSM, sequentialevaluate-boolean-expression— simplify control logic for hardwiredderive-theoretical-result— perf analysis (CPI, throughput, Amdahl)
GitHub 仓库
相关推荐技能
content-collections
元Content Collections 是一个 TypeScript 优先的构建工具,可将本地 Markdown/MDX 文件转换为类型安全的数据集合。它专为构建博客、文档站和内容密集型 Vite+React 应用而设计,提供基于 Zod 的自动模式验证。该工具涵盖从 Vite 插件配置、MDX 编译到生产环境部署的完整工作流。
polymarket
元这个Claude Skill为开发者提供完整的Polymarket预测市场开发支持,涵盖API调用、交易执行和市场数据分析。关键特性包括实时WebSocket数据流,可监控实时交易、订单和市场动态。开发者可用它构建预测市场应用、实施交易策略并集成实时市场预测功能。
creating-opencode-plugins
元该Skill帮助开发者创建OpenCode插件,用于接入命令、文件、LSP等25+种事件。它提供了插件结构、事件API规范和JavaScript/TypeScript实现模式,适合需要拦截操作、扩展功能或自定义事件处理的场景。开发者可通过它快速构建响应式模块来增强OpenCode AI助手的能力。
sglang
元SGLang是一个专为LLM设计的高性能推理框架,特别适用于需要结构化输出的场景。它通过RadixAttention前缀缓存技术,在处理JSON、正则表达式、工具调用等具有重复前缀的复杂工作流时,能实现极速生成。如果你正在构建智能体或多轮对话系统,并追求远超vLLM的推理性能,SGLang是理想选择。
