Below is the syntax and the encoding The instruction below c

Below is the syntax and the encoding. The instruction below computes (t1+t2) and stores the result in register t0 and also in the memory at address zero. The ‘addz’ instruction always uses address zero in the memory

c) Draw the changes on the datapath diagram.

d) Give the values of the control signals.

Add ALU 1 Add result P. Shift left 22 RegDst MemRead Instruction [31-26] MemtoReg Control MemWrite ALUSC RegWrite Instruction 125-21 Read Pc Read register 1 Read address Instruction 120-16 data 1 Read Zero register 2 nstruction ALU ALU Read result Address M 131-0 M Read Write Instruction data 2 Instruction [15-11 memory Write data Registers Data Write memory data 32 Sign 16 Instruction [15-0] ALU extend control Instruction (5-0)

Pipelining is a general-purpose efficiency technique

– Itisnotspecifictoprocessors

Pipelining is used in:
– Assemblylines
– Fastfoodrestaurants

Pipelining gives the best of both worlds and is used in just about every modern processor.

1998 Morgan Kaufmann Publishers 6

Instruction execution review
Executing a MIPS instruction can take up to five steps.

Step

Name

Description

Instruction Fetch

Read an instruction from memory.

Instruction Decode

Read source registers and generate control signals.

Execute

Compute an R-type result or a branch outcome.

Memory

MEM

Read or write the data memory.

Writeback

Store a result in the destination register.

However, as we saw, not all instructions need all five steps.

Instruction

Steps required

beq

IF ID EX

R-type

IF ID EX WB

IF ID EX MEM

IF ID EX MEM WB

1998 Morgan Kaufmann Publishers 7

Single-cycle datapath diagram

Add

Mux

PCSrc

Add

2ns

1 ALUOp ALUSrc

1ns

RegWrite

Shift left 2

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]

M0 I[15-11] x1

RegDst

I [15 - 0]

2ns

MemWrite

MemToReg

M1 ux

Read register 1

Read register 2

Write register

Registers

ALU Zero Result

2ns

M0
u ux 0

Sign extend

Read Read address data

Write address

Data memory

Write data

Read data 1

Read data 2

Write data

MemRead

How long does it take to execute each instruction?
1998 Morgan Kaufmann Publishers 8

Review: Instruction Fetch (IF)

Let’s quickly review how lw is executed in the single-cycle datapath. We’ll ignore PC incrementing and branching for now.
In the Instruction Fetch (IF) step, we read the instruction memory.

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

MemWrite

MemToReg

I[15-11] x1

RegDst

I [15 - 0]

1 ALUOp ALUSrc

MemRead

Read register 1

Read register 2

Write register

Registers

Write data

Read data 1

Read data 2

ALU Zero 0M Result

Read Read address data

Write address

Data memory

Mu ux 0

Write data

Sign extend

1998 Morgan Kaufmann Publishers 10

Instruction Decode (ID)

The Instruction Decode (ID) step reads the source register from the register file.

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

I[15-11] x1

RegDst

I [15 - 0]

MemWrite

MemToReg

Read register 1

Read register 2

Write register

Registers

0M
Mu ux 0

ALU Zero Result

Write data

Read data 1

Read data 2

1 ALUOp ALUSrc

MemRead

Read Read address data

Write address

Data memory

Write data

Sign extend

1998 Morgan Kaufmann Publishers 11

Execute (EX)

The third step, Execute (EX), computes the effective memory address from the source register and the instruction’s constant field.

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

I[15-11] x1

RegDst

I [15 - 0]

MemWrite

MemToReg

Read register 1

Read register 2

Write register

Registers

Read data 1

Read data 2

0M
Mu ux 0

ALU Zero Result

Write data

1 ALUOp ALUSrc

MemRead

Read Read address data

Write address

Data memory

Write data

Sign extend

1998 Morgan Kaufmann Publishers 12

Memory (MEM)

The Memory (MEM) step involves reading the data memory, from the address computed by the ALU.

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

I[15-11] x1

RegDst

I [15 - 0]

MemWrite

MemToReg

Read register 1

Read register 2

Write register

Registers

Read data 1

Read data 2

ALU Zero Result

0M
Mu ux 0

Write data

Sign extend

1 ALUOp ALUSrc

MemRead

Read Read address data

Write address

Data memory

Write data

1998 Morgan Kaufmann Publishers 13

Writeback (WB)

Finally, in the Writeback (WB) step, the memory value is stored into the destination register.

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]

MemWrite

MemToReg 1M

Read register 1

Read register 2

Write register

Registers

Read data 1

Read data 2

ALU Zero Result

0M
Mu ux 0

I[15-11] x1

RegDst

I [15 - 0]

1 ALUOp ALUSrc

MemRead

Write data

Read Read address data

Write address

Data memory

Write data

Sign extend

1998 Morgan Kaufmann Publishers 14

A bunch of lazy functional units
Notice that each execution step uses a different functional unit.

In other words, the main units are idle for most of the 8ns cycle!

– The instruction RAM is used for just 2ns at the start of the cycle.

– Registers are read once in ID (1ns), and written once in WB (1ns).

– The ALU is used for 2ns near the middle of the cycle.

– Reading the data memory only takes 2ns as well.
That’s a lot of hardware sitting around doing nothing.

1998 Morgan Kaufmann Publishers 15

Putting those slackers to work

We shouldn’t have to wait for the entire instruction to complete before we can re-use the functional units.

For example, the instruction memory is free in the Instruction Decode step as shown below, so...

Idle Instruction Decode (ID) RegWrite

I [25 - 21]

I [20 - 16]
0

I[15-11] x1

RegDst

I [15 - 0]

MemWrite

MemToReg

Read Instruction address [31-0]

Instruction memory

Read register 1

Read register 2

Write register

Registers

Read data 1

Read data 2

ALU Zero Result

0M
Mu ux 0

Write data

1 ALUOp ALUSrc

MemRead

Read Read address data

Write address

Data memory

Write data

Sign extend

1998 Morgan Kaufmann Publishers 16

Decoding and fetching together

Why don’t we go ahead and fetch the next instruction while we’re decoding the first one?

Fetch 2nd

Decode 1st instruction

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

I[15-11] x1

RegDst

I [15 - 0]

MemWrite

MemToReg

Read register 1

Read register 2

Write register

Registers

Read data 1

Read data 2

ALU Zero Result

0M
Mu ux 0

Write data

1 ALUOp ALUSrc

MemRead

Read Read address data

Write address

Data memory

Write data

Sign extend

1998 Morgan Kaufmann Publishers 17

Executing, decoding and fetching

Similarly, once the first instruction enters its Execute stage, we can go ahead and decode the second instruction.

But now the instruction memory is free again, so we can fetch the third instruction!

Fetch 3rd

Decode 2nd

Execute 1st

ALU Zero 0M Result

RegWrite

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

MemWrite

MemToReg

Read register 1

Read register 2

Write register

Registers

Read data 1

Read data 2

Mu ux 0

I[15-11] x1

RegDst

I [15 - 0]

1 ALUOp ALUSrc

MemRead

Write data

Read Read address data

Write address

Data memory

Write data

Sign extend

1998 Morgan Kaufmann Publishers 18

Making Pipelining Work

We’ll make our pipeline 5 stages long, to handle load instructions

– Stages are: IF, ID, EX, MEM, and WB

We want to support executing 5 instructions simultaneously: one in each stage.

1998 Morgan Kaufmann Publishers 19

Break datapath into 5 stages Each stage has its own functional units.

Each stage can execute in 2ns
IF ID EXE MEM WB

Read Instruction address [31-0]

Instruction memory

I [25 - 21]

I [20 - 16]
0

MemWrite

MemToReg

RegWrite

Read register 1

Read register 2

Write register

Registers

ALU Zero 0M Result

Mu ux 0

I[15-11] x1

RegDst

I [15 - 0]

1 ALUSrc

ALUOp

Write data

Read data 1

Read data 2

Read Read address data

Write address

Data memory

Write data

MemRead

2ns

Sign extend

2(1)ns

2ns 2ns

1998 Morgan Kaufmann Publishers 20

Pipelining Loads

lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)

Clock cycle 123456789

MEM

A pipeline diagram shows the execution of a series of instructions.

– The instruction sequence is shown vertically, from top to bottom.

– Clock cycles are shown horizontally, from left to right.

– Each instruction is divided into its component stages. (We show five stages for every instruction, which will make the control unit easier.)

This clearly indicates the overlapping of instructions. For example, there are three instructions active in the third cycle above.

– The “lw $t0” instruction is in its Execute stage.

– Simultaneously, the “lw $t1” is in its Instruction Decode stage.

– Also, the “lw $t2” instruction is just being fetched.

1998 Morgan Kaufmann Publishers 21

Pipelining terminology

lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)

Clock cycle 123456789

MEM

filling

full emptying

The pipeline depth is the number of stages—in this case, five.

In the first four cycles here, the pipeline is filling, since there are

unused functional units.

In cycle 5, the pipeline is full. Five instructions are being executed simultaneously, so all hardware units are in use.

In cycles 6-9, the pipeline is emptying.

1998 Morgan Kaufmann Publishers 22

Pipelining Performance

lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)

Clock cycle

123456789

MEM

filling

Execution time on ideal pipeline:

– time to fill the pipeline + one cycle per instruction

– How long for N instructions?

Compared to single-cycle design, how much faster is pipelining for N=1000 ?

1998 Morgan Kaufmann Publishers 23

Pipeline Datapath: Resource Requirements

lw $t0, 4($sp) lw $t1, 8($sp) lw $t2, 12($sp) lw $t3, 16($sp) lw $t4, 20($sp)

Clock cycle 123456789

MEM

We need to perform several operations in the same cycle.

– Increment the PC and add registers at the same time.

– Fetch one instruction while another one reads or writes data.

What does that mean for our hardware?

1998 Morgan Kaufmann Publishers 24

Pipelining other instruction types
R-type instructions only require 4 stages: IF, ID, EX, and WB

– We don’t need the MEM stage

What happens if we try to pipeline loads with R-type instructions?

add $sp, $sp, -4 sub $v0, $a0, $a1

Clock cycle 123456789

lw or lw

$t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)

MEM

– Load uses Register File’s Write Port during its 5th (cycle 7) stage – R-type uses Register File’s Write Port during its 4th (cycle 7) stage

1998 Morgan Kaufmann Publishers 25

A solution: Insert NOP stages Enforce uniformity

– –

Make all instructions take 5 cycles.
Make them have the same stages, in the same order

• Some stages will do nothing for some instructions R-type

NOP

add sub lw or lw

$sp, $sp, -4 $v0, $a0, $a1 $t0, 4($sp) $s0, $s1, $s2 $t1, 8($sp)

Clock cycle 123456789

NOP

MEM

NOP

MEM

• Stores and Branches have NOP stages, too... store

branch

MEM

NOP

What we have so far

Pipelining attempts to maximize instruction throughput by overlapping the execution of multiple instructions.

Pipelining offers amazing speedup.
– In the best case, one instruction finishes on every cycle, and

the speedup is equal to the pipeline depth.
The pipeline datapath is much like the single-cycle one, but with

added pipeline registers

– Each stage needs its own functional units

Next we’ll see the datapath and control, and walk through an example execute

Step	Name	Description
Instruction Fetch	IF	Read an instruction from memory.
Instruction Decode	ID	Read source registers and generate control signals.
Execute	EX	Compute an R-type result or a branch outcome.
Memory	MEM	Read or write the data memory.
Writeback	WB	Store a result in the destination register.

Below is the syntax and the encoding. The instruction below computes (t1+t2) and stores the result in register t0 and also in the memory at address zero. The ‘a

Below is the syntax and the encoding The instruction below c

Solution

Get Help Now

Submit a Take Down Notice