Consider the parallel algorithm given in Algorithm for multi
Consider the parallel algorithm given in Algorithm for multiplying two n x n matrices A and B to obtain the product matrix C. Assume that it takes time t_iocal for a memory read write operation on a matrix element and time t_c to add and multiply two numbers. Determine the parallel run time for this algorithm on an n^2 processor CREW PRAM. Is this parallel algorithm cost-optimal? Algorithm 8.7 An algorithm for multiplying two n x n matrices A and E on a CREW PRAM, yielding matrix C = A x B. procedure MAT_MULT_CREW_PRAM (A, B, C. n) begin organize the n^2 processes into a logical mesh of n x n: for each process P_i, j do begin c[i, j]: = 0: for k: = 0 to n - 1 do C[i, j]: = c[i, j] + A[i, K]|: endfor; end MAT MULT CREW PRAM
Solution
With n2 parallel processors, the run time would be O(n)
Complexity issues:
Time complexity = O(log n)
Total number of steps = n * log n = O(n log n)
Local phase: compute maximum over log n values : Simple sequential algorithm
Time for local phase = O(log n)
Global phase: take (n/log n) local maximums and compute global maximum using the tournament algorithm
Time for global phase = O(log (n/log n)) = O(log n)
Example:
n = 16
Number of processors, p = n/log n = 4
Divide 16 elements into four groups of four each
Local phase: each processor computes the maximum of its four local elements
Global phase: performed amongst the maximums computed by the four processors
