In a machine M1 clocked at 100 MHz it was observed that 20 o
In a machine M1 clocked at 100 MHz it was observed that 20% of the computation time of integer benchmarks is spent in the subroutine multiply(A, B, C), which multiplies integer A and B and returns the result in C. Furthermore, each invocation of Multiply takes 800 cycles to execute. To speed up the program it is proposed to introduce a new instruction MULT to improve the performance of the machine on integer benchmarks. Please answer the following questions, if you have enough data. If there are not enough data, simply answer “not enough data.”
* (a) How many times is the multiply routine executed in the set of programs?
* (b) An implementation of the MULT instruction is proposed for a new machine M2; MULT executes the multiplication in 40 cycles (an improvement over the 800 cycles needed in M1). Besides the multiplies, all other instructions which were not part of the multiply routine in M1 have the same CPI in M1 and M2. Because of the added complexity, however, the clock rate of M2 is only 80 MHz. How much faster (or slower) is M2 than M1?
* (c) A faster hardware implementation of the MULT instruction is designed and simulated for a proposed machine M3, also clocked at 80 MHz. A speedup of 10% over M1 is observed. Is this possible, or is there a bug in the simulator? If it is possible, how many cycles does the mult instruction take in this new machine? If it is not possible, why is this so?
Solution
a. Not enough data, as how many sets of programs are being executed.
b. As it was taking 20% of the time for 800 cycles,now it will take only 1% of that time for 40 cycles. Remaining 19% . utilising this 19 % percent, it can be max speed upto 96MHZ, hence M2 will be slower than M1.
c. It is nt possible as it is 20% slower than M1. If we utilise all 20% of M1 and it takes no cycles to execute MULT , then only M3 will be equivalent to M1, but here at least 1 cycle need to be executed for M3, hence it will be slower than M1
