Below is the syntax and the encoding The instruction below c
Solution
Pipelining is a general-purpose efficiency technique
– Itisnotspecifictoprocessors
 Pipelining is used in:
 – Assemblylines
 – Fastfoodrestaurants
Pipelining gives the best of both worlds and is used in just about every modern processor.
1998 Morgan Kaufmann Publishers 6
Instruction execution review
  Executing a MIPS instruction can take up to five steps.
Step
Name
Description
Instruction Fetch
IF
Read an instruction from memory.
Instruction Decode
ID
Read source registers and generate control signals.
Execute
EX
Compute an R-type result or a branch outcome.
Memory
MEM
Read or write the data memory.
Writeback
WB
Store a result in the destination register.
However, as we saw, not all instructions need all five steps.
Instruction
Steps required
beq
IF ID EX
R-type
IF ID EX WB
sw
IF ID EX MEM
lw
IF ID EX MEM WB
1998 Morgan Kaufmann Publishers 7
Single-cycle datapath diagram
PC
Add
0
Mux
1
PCSrc
4
Add
2ns
1 ALUOp ALUSrc
1ns
RegWrite
Shift left 2
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
M0 I[15-11] x1
RegDst
I [15 - 0]
2ns
MemWrite
MemToReg
M1 ux
Read register 1
Read register 2
Write register
Registers
ALU Zero Result
2ns
M0
 u ux 0
Sign extend
Read Read address data
Write address
Data memory
Write data
Read data 1
Read data 2
Write data
MemRead
 How long does it take to execute each instruction?
 1998 Morgan Kaufmann Publishers 8
Review: Instruction Fetch (IF)
 Let’s quickly review how lw is executed in the single-cycle datapath.  We’ll ignore PC incrementing and branching for now.
  In the Instruction Fetch (IF) step, we read the instruction memory.
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
MemWrite
MemToReg
1M
ux
I[15-11] x1
RegDst
I [15 - 0]
1 ALUOp ALUSrc
MemRead
Read register 1
Read register 2
Write register
Registers
Write data
Read data 1
Read data 2
ALU Zero 0M Result
Read Read address data
Write address
Data memory
Mu ux 0
Write data
Sign extend
1998 Morgan Kaufmann Publishers 10
Instruction Decode (ID)
The Instruction Decode (ID) step reads the source register from the register file.
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
I[15-11] x1
RegDst
I [15 - 0]
MemWrite
MemToReg
1M
Read register 1
Read register 2
Write register
Registers
0M
 Mu ux 0
ALU Zero Result
ux
Write data
Read data 1
Read data 2
1 ALUOp ALUSrc
MemRead
Read Read address data
Write address
Data memory
Write data
Sign extend
1998 Morgan Kaufmann Publishers 11
Execute (EX)
The third step, Execute (EX), computes the effective memory address from the source register and the instruction’s constant field.
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
I[15-11] x1
RegDst
I [15 - 0]
MemWrite
MemToReg
1M
Read register 1
Read register 2
Write register
Registers
Read data 1
Read data 2
0M
 Mu ux 0
ALU Zero Result
ux
Write data
1 ALUOp ALUSrc
MemRead
Read Read address data
Write address
Data memory
Write data
Sign extend
1998 Morgan Kaufmann Publishers 12
Memory (MEM)
The Memory (MEM) step involves reading the data memory, from the address computed by the ALU.
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
I[15-11] x1
RegDst
I [15 - 0]
MemWrite
MemToReg
1M
Read register 1
Read register 2
Write register
Registers
Read data 1
Read data 2
ALU Zero Result
0M
 Mu ux 0
ux
Write data
Sign extend
1 ALUOp ALUSrc
MemRead
Read Read address data
Write address
Data memory
Write data
1998 Morgan Kaufmann Publishers 13
Writeback (WB)
Finally, in the Writeback (WB) step, the memory value is stored into the destination register.
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
0
MemWrite
MemToReg 1M
Read register 1
Read register 2
Write register
Registers
Read data 1
Read data 2
ALU Zero Result
0M
 Mu ux 0
ux
I[15-11] x1
RegDst
I [15 - 0]
1 ALUOp ALUSrc
MemRead
Write data
Read Read address data
Write address
Data memory
Write data
Sign extend
1998 Morgan Kaufmann Publishers 14
A bunch of lazy functional units
  Notice that each execution step uses a different functional unit.
In other words, the main units are idle for most of the 8ns cycle!
– The instruction RAM is used for just 2ns at the start of the cycle.
– Registers are read once in ID (1ns), and written once in WB (1ns).
– The ALU is used for 2ns near the middle of the cycle.
– Reading the data memory only takes 2ns as well.
  That’s a lot of hardware sitting around doing nothing.
1998 Morgan Kaufmann Publishers 15
Putting those slackers to work
We shouldn’t have to wait for the entire instruction to complete before we can re-use the functional units.
For example, the instruction memory is free in the Instruction Decode step as shown below, so...
Idle Instruction Decode (ID) RegWrite
I [25 - 21]
I [20 - 16]
 0
I[15-11] x1
RegDst
I [15 - 0]
MemWrite
MemToReg
1M
Read Instruction address [31-0]
Instruction memory
Read register 1
Read register 2
Write register
Registers
Read data 1
Read data 2
ALU Zero Result
0M
 Mu ux 0
ux
Write data
1 ALUOp ALUSrc
MemRead
Read Read address data
Write address
Data memory
Write data
Sign extend
1998 Morgan Kaufmann Publishers 16
Decoding and fetching together
Why don’t we go ahead and fetch the next instruction while we’re decoding the first one?
Fetch 2nd
Decode 1st instruction
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
I[15-11] x1
RegDst
I [15 - 0]
MemWrite
MemToReg
1M
Read register 1
Read register 2
Write register
Registers
Read data 1
Read data 2
ALU Zero Result
0M
 Mu ux 0
ux
Write data
1 ALUOp ALUSrc
MemRead
Read Read address data
Write address
Data memory
Write data
Sign extend
1998 Morgan Kaufmann Publishers 17
Executing, decoding and fetching
Similarly, once the first instruction enters its Execute stage, we can go ahead and decode the second instruction.
But now the instruction memory is free again, so we can fetch the third instruction!
Fetch 3rd
Decode 2nd
Execute 1st
ALU Zero 0M Result
RegWrite
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
MemWrite
MemToReg
1M
ux
Read register 1
Read register 2
Write register
Registers
Read data 1
Read data 2
Mu ux 0
I[15-11] x1
RegDst
I [15 - 0]
1 ALUOp ALUSrc
MemRead
Write data
Read Read address data
Write address
Data memory
Write data
Sign extend
1998 Morgan Kaufmann Publishers 18
Making Pipelining Work
We’ll make our pipeline 5 stages long, to handle load instructions
– Stages are: IF, ID, EX, MEM, and WB
We want to support executing 5 instructions simultaneously: one in each stage.
1998 Morgan Kaufmann Publishers 19
Break datapath into 5 stages Each stage has its own functional units.
 Each stage can execute in 2ns
 IF ID EXE MEM WB
Read Instruction address [31-0]
Instruction memory
I [25 - 21]
I [20 - 16]
 0
MemWrite
MemToReg
1M
ux
RegWrite
Read register 1
Read register 2
Write register
Registers
ALU Zero 0M Result
Mu ux 0
I[15-11] x1
RegDst
I [15 - 0]
1 ALUSrc
ALUOp
Write data
Read data 1
Read data 2
Read Read address data
Write address
Data memory
Write data
MemRead
2ns
Sign extend
2(1)ns
2ns 2ns
1998 Morgan Kaufmann Publishers 20
Pipelining Loads
lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)
Clock cycle 123456789
IF
ID
EX
IF
MEM
ID
IF
WB
IF
ID
EX
MEM
IF
WB
EX
MEM
WB
ID
EX
MEM
WB
ID
EX
MEM
WB
A pipeline diagram shows the execution of a series of instructions.
– The instruction sequence is shown vertically, from top to bottom.
– Clock cycles are shown horizontally, from left to right.
– Each instruction is divided into its component stages. (We show five stages for every instruction, which will make the control unit easier.)
This clearly indicates the overlapping of instructions. For example, there are three instructions active in the third cycle above.
– The “lw $t0” instruction is in its Execute stage.
– Simultaneously, the “lw $t1” is in its Instruction Decode stage.
– Also, the “lw $t2” instruction is just being fetched.
1998 Morgan Kaufmann Publishers 21
Pipelining terminology
lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)
Clock cycle 123456789
IF
ID
EX
IF
MEM
ID
IF
WB
IF
ID
EX
MEM
IF
WB
EX
MEM
WB
ID
EX
MEM
WB
ID
EX
MEM
WB
filling
full emptying
The pipeline depth is the number of stages—in this case, five.
In the first four cycles here, the pipeline is filling, since there are
unused functional units.
In cycle 5, the pipeline is full. Five instructions are being executed simultaneously, so all hardware units are in use.
In cycles 6-9, the pipeline is emptying.
1998 Morgan Kaufmann Publishers 22
Pipelining Performance
lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)
Clock cycle
123456789
IF
ID
EX
IF
MEM
ID
WB
IF
ID
EX
MEM
WB
MEM
WB
IF
EX
ID
EX
MEM
WB
IF
ID
EX
MEM
WB
filling
Execution time on ideal pipeline:
– time to fill the pipeline + one cycle per instruction
– How long for N instructions?
Compared to single-cycle design, how much faster is pipelining for N=1000 ?
1998 Morgan Kaufmann Publishers 23
Pipeline Datapath: Resource Requirements
lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)
Clock cycle 123456789
IF
ID
IF
EX
ID
IF
MEM
EX
ID
IF
WB
MEM
EX
ID
WB
MEM
EX
WB
MEM
WB
IF
ID
EX
MEM
WB
We need to perform several operations in the same cycle.
– Increment the PC and add registers at the same time.
– Fetch one instruction while another one reads or writes data.
What does that mean for our hardware?
1998 Morgan Kaufmann Publishers 24
Pipelining other instruction types
  R-type instructions only require 4 stages: IF, ID, EX, and WB
– We don’t need the MEM stage
What happens if we try to pipeline loads with R-type instructions?
add $sp, $sp, -4 sub $v0, $a0, $a1
Clock cycle 123456789
IF
ID
lw or lw
$t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)
IF
ID
EX
IF
WB
EX
ID
IF
WB
EX
ID
MEM
EX
WB
IF
ID
WB
EX
MEM
W
– Load uses Register File’s Write Port during its 5th (cycle 7) stage – R-type uses Register File’s Write Port during its 4th (cycle 7) stage
1998 Morgan Kaufmann Publishers 25
A solution: Insert NOP stages Enforce uniformity
– –
Make all instructions take 5 cycles.
 Make them have the same stages, in the same order
• Some stages will do nothing for some instructions R-type
IF
ID
EX
NOP
WB
add sub lw or lw
$sp, $sp, -4 $v0, $a0, $a1 $t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)
Clock cycle 123456789
IF
ID
IF
EX
ID
IF
NOP
EX
ID
IF
WB
NOP
ID
WB
EX
MEM
WB
NOP
WB
IF
EX
ID
EX
MEM
WB
• Stores and Branches have NOP stages, too... store
branch
IF
ID
EX
MEM
NOP
IF
ID
EX
NOP
NOP
What we have so far
Pipelining attempts to maximize instruction throughput by overlapping the execution of multiple instructions.
 Pipelining offers amazing speedup.
 – In the best case, one instruction finishes on every cycle, and
the speedup is equal to the pipeline depth.
  The pipeline datapath is much like the single-cycle one, but with
added pipeline registers
– Each stage needs its own functional units
Next we’ll see the datapath and control, and walk through an example execute
| Step | Name | Description | 
| Instruction Fetch | IF | Read an instruction from memory. | 
| Instruction Decode | ID | Read source registers and generate control signals. | 
| Execute | EX | Compute an R-type result or a branch outcome. | 
| Memory | MEM | Read or write the data memory. | 
| Writeback | WB | Store a result in the destination register. | 




















