A MapReduce program referred to as a job consists of code fo

A MapReduce program, referred to as a job, consists of code for mappers and reducers. Write pseducode for the wordcount MapReduce program. What does the shuffle and sort procedure do when a MapReduce job is run? List several methods/ideas that can reduce the number of intermediate key-values for a MapReduce job.

Solution

WordCount(InputTextFile)

   Map {
       text = Read the text from InputTextFile
       w = split text into words delimited by ,
       for each word in w
           write(word,1)
       end for
   }
  
   k = set of all words
  
   for each key in k
       Reduce {
           v = set of all values
           sum = 0
           for each value in v
               sum = sum + value
           end for
           write(k,sum)
       }

End

The shuffling process transfers the intermediate <key,value> pairs formed from the mappers to the reducers. The sorting process helps the reducer to group similar values together.  It will start a new reduce task, when the next key in the sorted input data is different than the previous. Each reduce task takes a list of key-value pairs, but it has to call the reduce() method which takes a key-list(value) input, so it has to group values by key. This makes the reducer faster.

Other methods to reduce intermediate key value pairs is to create a hashmap with each entry having a word as key and increment the value of a hashmap entry when the key is matched. The entries of the hashmap becomes the intermediate key-value pairs. So, for each mapper there will be 1 key-value pair for each word.

 A MapReduce program, referred to as a job, consists of code for mappers and reducers. Write pseducode for the wordcount MapReduce program. What does the shuffl

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site