A reader (Barry Watson) supplied this updated version of the Vespa structural Verilog implementation. The key changes, as explained by Barry, are: 1. The D-type flip flop uses blocking assignment (=) instead of non-blocking assignment (<=). This leads to problems in stage 4 where IR5 is always updated to be the same as IR4 on each clock tick. The other IR flip flops in other stages are fed into multiplexers so the 6 time unit propogation delay means that blocking assignment works for them. 2. The decoding of reg_write in stage 5 leads to problems. If we take ex3.asm as an example: ldi r2,#-1 sub r2,r2,r2 hlt This correctly writes 0 into r2 but because we do the following in stage 5 and #(gate_delay) gen_jmpl(jmpl, jmp5, IR5[16]); or #(gate_delay) gen_write_or(reg_write, add5, sub5, and5, or5, not5, xor5, jmpl, ld5, ldi5, ldx5); assign a3 = IR5[26:22]; The update of a3 (which is 5'b00000 for the hlt instruction) happens before the update of reg_write from 1'b1 to 1'b0, so, r0 has the value of 0 written to it as well. I fixed this with decoding reg_write and halt in stage 2 and I just let these flow through the pipeline along with everything else decoded at stage 2.