Author: Unreal good
Source: Hang Seng LIGHT Cloud Community
Basic overview
Apache YARN (Yet Another Resource Negotiator) is the Resource management and job scheduling system in Hadoop that was introduced in Hadoop 2.x.
Users can deploy various service frameworks on YARN for unified management and resource allocation.
Yarn is introduced in Hadoop2.x. In Hadoop1.x, MapReduce allocates resources. If MapReduce fails in computing, resource scheduling stops. Have a Yarn.
Core architecture
Yarn architecture consists of ResourceManager, NodeManager, ApplicationMaster, and Container
ResourceManager
ResourceManager
Usually deployed independently on a single machine running as an application, there is only one in the cluster, responsible for resource management and allocation of the entire system.ResourceManager
It is mainly composed of two components: Scheduler and Applications Manager (ASM). It can make decisions based on application priorities, queue capacity, and data location, make allocation policies, and schedule cluster resources in a secure, shared, and multi-tenant manner.
NodeManager
NodeManager
Is the manager of each node in the YARN cluster. Responsible for managing the life cycle of all containers within the node, monitoring resources and tracking node health.NodeManager
Mainly used for processing fromResourceManager
、ApplicationMaster
The command.
When a node is started, it registers with ResourceManager and pushes available resource information. During the running, NodeManager and ResourceManager work together to constantly update the information and ensure the optimal status of the cluster.
ApplicationMaster
- When a user submits an application, YARN starts a lightweight process
ApplicationMaster
. ApplicationMaster
Responsible for coordinating fromResourceManager
Resources, and throughNodeManager
Monitors resource usage within the container and is responsible for task monitoring and fault tolerance.
ApplicationMaster can split data, dynamically match resource requirements based on application status, monitor and track task status and progress, and report application progress information.
Container
Container
A resource abstraction in YARN encapsulates multi-dimensional resources on a node, such as memory, CPU, disks, and networks.- when
ApplicationMaster
向ResourceManager
When applying for resources,ResourceManager
为ApplicationMaster
The returned resource is usedContainer
Said. - YARN allocates one task to each task
Container
Can be used only by this taskContainer
Resources described in. ApplicationMaster
Can be inContainer
To run any type of task.
The working process
The whole workflow of YARN application submission:
- First, the client submits the task to YARN.
ResourceManager
Submit the application and request oneApplicationMaster
Instance; ResourceManager
Will choose a runnable oneNodeManager
And, inContainer
Is up and runningApplicationMaster
Instance;- To start the
ApplicationMaster
向ResourceManager
Register yourself and maintain heartbeat communication with RM after successful startup. ApplicationMaster
向ResourceManager
Send the request to get the requiredContainer
Resources;ApplicationMaster
By getting itContainer
Resources perform distributed computing.- After the application runs,
ApplicationMaster
向ResourceManager
Unregister yourself and allow it to belong to youcontainer
Be retrieved.
conclusion
Yarn schedules and allocates service resources in the Hadoop system to maximize the utilization of machine resources.