Assume for arithmetic loadstore and branch instructions a pr
Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. Also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency. Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the number of processors) but the number of branch instructions per processor remains the same.
1) Find the total execution time for this program on 1, 2, 4, and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.
2) To what should the CPI of load/store instructions be reduced in order for a single processor to match the performance of four processors using the original CPI values?
Please show all the steps. Thank you.
Solution
1.Total execution time = CPU clock cycles /Clock rate
CPU clock cyles = Instructions for a program × Average clock cycles per instruction
CPU clock cycles = Number of arithmetic instructions × CPI of arithmetic instructions + Number of load/store instructions × CPI of load/store instructions + Number of branch instructions × CPI of branch instructions.
1 processor
Execution time = 2 . 56 E 9 × 1 + 1 . 28 E 9 × 12 + 256 × 1000000 × 5 2 × 1000000000 = 9.6 s
2 processor
Number of instructions = 2 . 56 E 9 × 1 0 . 7 × 2 + 1 . 28 E 9 × 12 0 . 7 × 2 + 256 × 1000000 × 5 = 14080000000
Exection time = 14080000000 2 × 1000000000 = 7.04 s Speedup = Execution time on 1 processors Execution time on 2 processor = 9 . 6 7 . 04 =1.36 4 processor Number of instructions = 2 . 56 E 9 × 1 0 . 7 × 4 + 1 . 28 E 9 × 12 0 . 7 × 4 + 256 × 1000000 × 5 = 7680000000
