We all know that there are three major execution managers in Hadoop today. One HDFS, one MapReduce, and we’re going to look at YARN today.
Hadoop before 2.0
In Hadoop before 2.0, there was no Yarn mode for management. Most of them fight alone. Hbase does its own, Spark does its own, and so on. This will cause a waste of resources, can not fully use the resources. In particular, 1.x versions are prone to single points of failure and are not easily scalable.
In this case, Client requests are distributed through a JobTracker if our JobTracker is abnormal. The whole cluster can’t work properly.
When JobTracker is overloaded with Taskschedulers, it is easy to run out of memory and CPU. This increases the risk of task failure.
Because of these situations, Hadoop needs a newer generation of management engines to help us manage the cluster-YARN engine as it evolves
In 2.0 the YARN
Driven by new services, YARN replaces the original mode. The original waste of resources are merged and common management is established under a mode management
The new YARN mode is as follows
- From the figure, we can see that the original JobTracker is divided into resource management and task scheduling monitoring.
- Let’s look at the architecture today
ResourceManager: Manages and schedules resources in clusters in a unified manner. And receives requests from clients. At the same time, it continuously receives heartbeat information from DataNode. And manage the cluster.
NodeManager:
There will be multiple of these nodes in the entire cluster. It is used to manage and use resources on its own nodes.
Periodically report the resource usage to ResourceManager. And receives various commands from ResourceManager
Launch the ApplicationMaster we see in the figure.
ApplicationMaster :
This ApplicationMaster corresponds to our submitted application, which can come from Spark,Hbase, or MapReduce. The master applies for resources from manager YARN. The program is then supplied for use.
Assign tasks to the next Container. This includes starting and stopping tasks.
Container
A container that encapsulates resources such as CPU and Memory.
Client
Submit tasks through the client to start and end tasks. In addition, the task execution progress is queried.
With these names in mind, let’s take a look at how the task execution process works.
Sends a request from the Client to ResourceManager. The content should include ApplicationMaster, ApplicationMaster startup command. The content of its own application.
ResourceManager allocates tasks to NodeManager
NodeManager starts ApplicationMaster based on the configuration information.
Register with ResourceManager and apply resources back to our ApplicationMaster.
Register the requested resources with NodeManger.
NodeManager starts the corresponding Container. In between, mission debriefs are made via heartbeat. And then after the debrief. Perform task management.
conclusion
The whole YARN process and the new structure is roughly like this. The new model solves the original single point problem. And high availability and scalability. A single cluster environment can be used by multiple applications. YARN mode helps us to solve the resource management problem, the programmer can focus on business development.
This article is from the cloud community partner “LuckQI”. For related information, you can follow “LuckQI”.