Click here to Skip to main content
15,881,559 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
How to sort intermediate output based on values in MapReduce ?


What I have tried:

How to sort intermediate output based on values in MapReduce?
Posted
Updated 27-Jul-18 2:42am

1 solution

"The MapReduce sort the intermediate data(between mapper and reducer phase) by key by default. If we want the data should be sort based on value, then we need secondary sorting. There are 2 approaches to fulfill the same.

1. If reducers will get all the value for a particular key and buffer them all. Then we can do an in-reducers sort based on value. But this is not a good approach reducer will be receiving all the values for the key and there might be a chance that reducer will go with out of memory. But this can work well for the lesser data.

2. The next approach is to create a composite key which is having 2 values, Natural Key, and Natural values, where the natural key will be used for partitioning and value will be used for sorting. This is the best approach as it will not turn out to out of memory error. We will be writing the partitioner code just to make sure that all data with the same key go to the same reducer and data arrives at reducer is grouped by the natural key.
"
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900