A MapReduce program referred to as a job consists of code fo
Solution
WordCount(InputTextFile)
Map {
text = Read the text from InputTextFile
w = split text into words delimited by ,
for each word in w
write(word,1)
end for
}
k = set of all words
for each key in k
Reduce {
v = set of all values
sum = 0
for each value in v
sum = sum + value
end for
write(k,sum)
}
End
The shuffling process transfers the intermediate <key,value> pairs formed from the mappers to the reducers. The sorting process helps the reducer to group similar values together. It will start a new reduce task, when the next key in the sorted input data is different than the previous. Each reduce task takes a list of key-value pairs, but it has to call the reduce() method which takes a key-list(value) input, so it has to group values by key. This makes the reducer faster.
Other methods to reduce intermediate key value pairs is to create a hashmap with each entry having a word as key and increment the value of a hashmap entry when the key is matched. The entries of the hashmap becomes the intermediate key-value pairs. So, for each mapper there will be 1 key-value pair for each word.
