Abstract:
MapReduce is well-applied in high performance
computing for large scale data processing. However, as
long as the clusters grow, handling with huge amount
of intermediate data produced in the shuffle and reduce
phases (middle step of Map Reduce) have impacts
heavily upon the performance. With local aggregation
(either combiners or in-mapper), shuffling large
amounts of data can be reduced which alleviates the
reduce straggler problem. The proposed modified B+
tree based indexing algorithm is applied to reduce
intermediate data amount for output retrieval fast as
well as scalable data storage