Posts

Showing posts from December, 2018

MapReduce

Image
What is MapReduce? MapReduce is an algorithm that processes and generates large data sets. The term "MapReduce" actually refers to two separate and distinct tasks that programs perform. The first is the "map" job. This process takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs). The second part is the "reduce" job. This process takes the output from the map as input and combines those data tuples into a smaller set of tuples. Example You have a file that contains two columns that represent a city and the corresponding temperature recorded in that city for the various measurement days.  Toronto, 18 Whitby, 27 New York, 32 Rome, 37 Toronto, 32 Whitby, 20 New York, 33 Rome, 38 Toronto, 22 Whitby, 19 New York, 20 Rome, 31 Toronto, 31 Whitby, 22 New York, 19 Rome, 30 The task is to produce an output that finds the maximum...