This is the 19th day of my participation in the August Wenwen Challenge.More challenges in August

Know the graphs

What is a mapperreduce

MapReduce, a clone of Google MapReuce, is derived from a paper of Google. It borrows from the idea of divide-and-conquer and divides a data processing process into two main steps, Map and Reduce. In this way, even if users do not understand the internal operation mechanism of distributed computing framework, they can clearly describe the problems to be dealt with with the ideas of Map and Reduce. By writing map and Reduce functions, you can easily use computing to achieve distribution and run it on Hadoop

Mapreduce features!

Development of simple

Thanks to the Programming model of MapReduce, users do not need to consider inter-process communication, socket programming, or very sophisticated skills. They only need to realize some simple logic, and the rest is done by the MapReduce computing framework, greatly simplifying the programming difficulty of distributed programs.

Strong scalability

Similar to HDFS, if cluster resources cannot meet computing requirements, you can linearly expand the cluster by adding nodes.

Fault tolerant

If a job fails due to a node failure, the MapReduce computing framework automatically resends the job to a healthy node until the job is complete, which is transparent to users

Or on the picture!!

MapReduce Project Deployment – Sorting user traffic

What is a user behavior log?

A user behavior log (user behavior track/traffic log) represents all behavior data (visit, browse, search, click, etc.) each time a user visits a site

The meaning of user analytics!

The Eyes of the Website

Where are you from? What are you looking for? Which pages are most popular? Where do you come in?

The nerve of the website

How is the page structured? How should links be designed to make it easier for users to use? How does the directory design user experience change well?

The brain of a website

Analyze objectives, such as an appropriate advertising budget based on the percentage of sales of a product in a particular city.

Without further ado, the code!!

** Preparation **

1, a pseudo-distributed virtual machine! Hadoop + JDK complete configuration

process

1. Use MapReduce to clean data and upload the data to HDFS

The directory structure

Flowbean class

The mapper class

Reduce class

Submit class

Submit tasks in Hadoop

If you have Maven, you can use Maven to package jar files. If you don’t, you can use eclipse’s packaging-only tool

! [Insert picture description here]Img – blog. Csdnimg. Cn / 20200815131…

Uploaded to the Linux

Starting the Hadoop Service Run the jar task

This is followed by the input and output paths

HDFS view ** Since there is no total traffic for my data now the total traffic is 0 **

  • conclusion
  • Simple MapReduce keeps the desired data
  • Write the cleaned data to Hive tomorrow