Hadoop2.5.2 study 07-MapReduce application case 2

Application Case 1: Find the total salary of each department

Problem analysis: MapReduce joins are divided into Reduce Side Join, Map Side Join, and semi Join.

Reduce Join: When a large number of transfers are performed in the Shuffle phase, a large number of NETWORK I/OS are inefficient.
Map Side Join: Useful when dealing with large tables associated with multiple small tables.

Map Side Join application scenario: Two tables to be joined. One table is very large and the other table is very small, so that the small table can be directly placed in memory. For each record key/value in the large table, check the hashTable to see if there are any records with the same key. If there are any records with the same key, join them and output them.

To support file replication, Hadoop provides a DistributeCache class that can be used as follows: DistributeCache

Users using static methods DistributedCache. AddCacheFile () to specify which file to replicate, whose parameters are file URI (if it is on HDFS files, can be like this: HDFS: / / jobtracker: 50030 / home/XXX/file). JobTracker retrieves this list of URIs before the job starts and copies the corresponding files to each TaskTracker’s local disk.
Users use the DistributedCache. GetLocalCacheFiles () method to obtain the file directory, and use the standard file to read and write API to read the corresponding files.

In the following code, the table with a small amount of data (Department Dept) is cached in the memory. In the Mapper phase, the department number of the employee is mapped to the department name, which is output to Reduce as a key. In Reduce, the total salary of each department is calculated by department

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop2.5.2 study 07–MapReduce application case 2

Application Case 1: Find the total salary of each department

Processing flow chart:

The test code

Hadoop2.5.2 study 07–MapReduce application case 2

Application Case 1: Find the total salary of each department

Processing flow chart:

The test code

Related Posts

The latest Java video tutorial, how to learn by yourself, and how to get a good offer

Contrast “ugettext” with “ugettext_lazy” in Django

Front-end can also learn algorithms: JS version of the common sorting algorithm – bubbling, insertion, quick sorting, merge