This is the sixth day of my participation in the August More text Challenge. For details, see:August is more challenging

The body of the

HDFS HA is used to solve the NameNode single point of failure, and HDFS federation is used to solve the NameNode memory bottleneck.Copy the code

NameNode single point of failure

The location of the NameNode in the HDFS is very important. No failure is allowed.

Since the metadata information of the entire HDFS file system is managed by NameNode, the availability of NameNode directly determines the availability of the entire HDFS.

In the Hadoop1.x era, the NameNode has a single point of failure and the NameNode process cannot work properly, which affects the normal use of the entire cluster and makes the entire HDFS unavailable.

Hive or HBase data is stored on HDFS, and all Hive or HBase frameworks are unavailable. This may cause many frameworks on your production cluster to be unavailable.

Data recovery by restarting the NameNode is also time-consuming.

HDFS NameNode HA architecture

In Hadoop2.x, HDFS NameNode single point issues have been resolved. Figure 4-1 shows the HDFS NameNode high availability architecture.

From the figure above, we can see that the high availability architecture of NameNode is divided into the following parts:

In a typical HA cluster, two separate machines act as namenodes.

At any time, only one NameNode is in the Active state and the other is in the Standby state. The Active NameNode takes care of all client operations, while the Standby NameNode simply acts as a Slave, maintaining state information so that it can switch quickly if needed.

Active/standby switchover controller ZKFailoverController

ZKFailoverController runs as an independent process and controls the NameNode active/standby switchover.

ZKFailoverController can detect the health status of the NameNode in time, and use Zookeeper to implement the active/standby election and switchover when the active NameNode is faulty. Of course, NameNode also supports manual active/standby switchover that does not depend on Zookeeper.

The Zookeeper cluster supports active/standby controller switchover.

Main responsibilities of the ZKFailoverController

Health monitoring

Periodically sends a health probe command to the NameNode it monitors to determine if a NameNode is in a healthy state. If the machine is down and the heartbeat fails, the ZKFailoverController flags it as being in an unhealthy state

Session management

If NameNode is healthy, ZKFailoverController keeps an open session in Zookeeper.

If the NameNode is Active, KFC will also have a transient ZNode in Zookeeper. When the NameNode fails, the Znode will be deleted, and the standby NameNode will get the lock. The upgrade is a primary NN with the state marked as Active.

When the broken NameNode is started, it registers With Zookeeper again. When it finds that a ZNode has been locked, it automatically changes to the Standby state. This cycle is repeated to ensure high reliability.

X supports a maximum of two Namenodes, and hadoop3. x supports more than two Namenodes.

Master the election

As mentioned above, a preemptive locking mechanism is implemented by maintaining a transient type of ZNode in Zookeeper to determine which NameNode is Active

Shared storage system Quorum Journal Node

To keep Standby nodes in sync with Active Nodes, they both communicate with a separate set of processes called “JournalNodes” (JNs).

Any namespace changes made by the Active Node send a log of the changes to most JNs.

Standby nodes can read edits from JNs and monitor their changes to the EditLog from time to time.

Once the Standby Node gets edits, it applies them to its own namespace.

During a failover, the Standby ensures that all edits have been read from JNs before it promotes itself to the Active state, which ensures that the namespace state is fully synchronized before the failover occurs.

The DataNode node

In addition to sharing HDFS metadata information through the shared storage system, the active NameNode and standby NameNode also need to share the mapping relationship between data blocks and Datanodes in the HDFS.

The DataNode reports the data block location information to both the active NameNode and standby NameNode.

Federation

Background of Federation

From the previous column, we know that HDFS cluster metadata information is stored in NameNode memory.

When the cluster size is large, NameNode metadata stored in the memory may be very large. Since all HDFS operations interact with NameNode, NameNode becomes a bottleneck in a large cluster.

Before hadoop2. x was created, the HDFS had only one namespace, and files in the HDFS could not be isolated.

For this reason, Federation is introduced in Hadoop2.x to solve the following scenarios.

  1. HDFS cluster scalability. Multiple Namenodes manage a subset of directories, making it possible for a cluster to scale to more nodes, rather than limiting the number of files that can be stored due to memory limitations in 1.0.
  2. Performance is more efficient. Multiple Namenodes manage different data and provide services at the same time, providing higher read/write throughput for users.
  3. Good isolation. Users can transfer different service data to different Namenodes as required, so that the impact between different services is small.

HDFS Data management architecture

Data storage adopts a hierarchical structure. In other words, all the information and management about the stored data is stored on the NameNode side, while the real data is stored on the Datanodes.

The data managed by the same NameNode belongs to the same namespace, and a namespace corresponds to a block pool.

A Block Pool is a collection of blocks in the same namespace, and of course this is the most common case of a single namespace,

When a NameNode manages all the metadata in the cluster, what if the NameNode memory is too high?

The metadata space is still growing, and simply raising the JVM size of the NameNode is not a permanent solution. This is where HDFS Federation comes in.

Federation architecture

NameNode memory is too high. It is possible to move large file directories to another NameNode for management. More importantly, these Namenodes share all datanodes in the cluster.

They’re still in the same cluster. Figure 4-1 shows the HDFS Federation structure.

HDFS Federation is a horizontal scale-out solution to the NameNode single point problem. This results in multiple independent Namenodes and namespaces.

This enables the HDFS naming service to scale horizontally. Namenodes in HDFS Federation manage their own namespaces independently of each other.

In this case, datanodes store data not only in a Block Pool, but in multiple blocks.

In the case of HDFS Federation, only metadata management and storage are separated, while real data storage is shared.

Literature reference

“Hadoop & Spark Big Data Development practice” Xiao Rui, Lei Gangyue editor-in-chief