Hadoop Learning 03 - High Availability HA setup

This is the 11th day of my participation in the More text Challenge. For more details, see more text Challenge

The official documentation

HA HA.

concept

I. limitations of hadoop1.0

The problem of the namenode
- Single point of failure: Only one Namenode
- Single point bottleneck: A Namenode that may not have enough memory to manage all Datanodes

Ii. High Availability

It is used to resolve single points of failure
Hadoop2.0 supports HA for only two nodes, and 3.0 supports one master and multiple slave nodes
If the master fails, it goes to stand by.

HA architecture

The high availability (HA) of the HDFS is mainly reflected in that ZooKeeper is used to implement active and standby Namenodes to resolve single point NameNode faults.
ZooKeeper stores HA status files and active/standby information. The number of ZK is recommended to be at least three and an odd number

NameNode Active/standby mode. The active node provides services, and the standby node synchronizes metadata from the active node and acts as active hot standby.
ZooKeeper Failover Controller (ZKFC) Is used to monitor the active/standby status of NameNode nodes.
JN(JournalNode) is used to store editLogs generated by Active NameNode. The Standby NameNode loads the Editlog on JN and synchronizes metadata.
ZKFC controls the NameNode active/standby arbitration
- ZKFC, as a compact arbitration agent, uses the distributed lock function of ZooKeeper to implement active/standby arbitration, and then controls the active/standby status of NameNode through command channels. ZKFC and NN are deployed together in the same number.
Metadata synchronization

Data synchronization between two NNS: two Namenodes are not working at the same time, only one NN is working at the same time
1. The two NNS must synchronize data information
  - Block imformation, which is processed by datanodes and reports ———— dynamic data information to the NN
  - Offset, size, ID, these are all handled by the NN itself ———— static information
2. Synchronization of dynamic information
  - DN reports to NN
  - The DN reporting to a single NN is now reporting to multiple NNS
3. Synchronization of static information
  - Since the static data information is handled by the NN itself, there are two NNS. How do you synchronize the information between the two NNS?
    - Use a JournalNode cluster
  - By deploying multiple JournalNode nodes on different servers, each of which receives the same information, multiple nodes prevent a single point of failure
  - The primary NN writes data to the JournalNode and the backup NN reads data from the JournalNode
  - Half mechanism (weak consistency) : Allows less than half of a JournalNode node to fail
    - The primary NN does not need all JournalNodes to confirm that the write is complete, allowing a small number of failures
    - Generally, an odd number of JournalNode nodes are configured
    - For example, 3 allow 1 invalid, 5 allow two invalid
Metadata persistence
- The active NameNode provides external services. The generated Editlog writes both locally and to JN, while updating metadata in the memory of the primary NameNode.
- When the standby NameNode monitors changes to the Editlog on JN, it loads the Editlog into memory and generates the same new metadata as the primary NameNode. Metadata synchronization is complete.
- The active and standby Fsimages are still stored on their disks and do not interact with each other. FSImage is a copy of metadata periodically written to the local disk. It is also called metadata mirroring

EditLog: Records user operation logs to generate new file system mirrors based on FSImage.
FSImage: periodically saves file images.
CKPT: Merge the FSImage and EditLog files in memory to create a new FSImage and write it to disk. This process is called checkpoint. After loading the fsimage and EditLog files, the standby NameNode writes the merged results to the local disk and the NFS at the same time. In this case, the disk contains the original fsimage file and the newly generated checkpoint file: fsimage.ckpt. Then change the name of fsimage.ckpt to fsimage (overwrite the original fsimage).
EditLog.new: The NameNode triggers the merge every one hour or the Editlog reaches 64MB. During the merge, the NameNode generates a new log file, editlog. new, to store the operation logs of the Standby NameNode because the data cannot be synchronized. After the Standby NameNode is merged into a fsimage, it is sent back to the active NameNode to replace the original fsimage and name editlog. new as Editlog.

Zookeeper cluster
- When the primary NN fails, it is used to switch NN automatically
- Zookeeper deploys the ZKFC process on the NN, performs election and health checks, and notifies stand by NN if the NN fails.
- Once the main NN is down, it immediately switches to Stand BY, and the other NN automatically switches to active

3. Federation

Resolve single point bottlenecks

architecture

Possible cause: The single Active NN architecture causes potential problems in the scalability and performance of THE HDFS cluster. When the cluster is large enough, the memory used by NN processes may reach hundreds of GIGABytes, and NN becomes a performance bottleneck.
Application scenario: Large – scale file storage. For example, Internet companies store user behavior data, telecom historical data, voice data and other large-scale data storage. At this point, NameNode memory is insufficient to support such a large cluster.
The common formula for estimating 1 gigabyte for 1 million blocks is 128T, which is the default block size. (This is a relatively rich estimate, in fact, even if there was only one block per file, all metadata information would not be 1KB per block.)
Federation: Each NameNode is responsible for its own directory. Similar to mounting disks to directories in Linux, each NameNode is only responsible for some directories in the HDFS cluster. If NameNode1 is responsible for the /database directory, NameNode1 is responsible for the file metadata in the /database directory. Metadata is not shared among Namenodes. Each NameNode has a standby standby.
Block pool: a group of file blocks belonging to a namespace (NS).
In a federated environment, each Namenode maintains a namespace volume, which contains metadata of the namespace and block pools for all data blocks of files in the namespace.
Namenodes are independent and do not communicate with each other. Failure of one namenode does not affect other Namenodes.
Datanodes register with all Namenodes in the cluster and store data for all block pools in the cluster.
AmeSpace (NS) : indicates the namespace. The HDFS namespace contains directories, files, and blocks. It can be understood as the logical directory where NameNode belongs.

HA high availability configuration

Official Setup document

I. Installation positions of each service node

ZK is in any position and must be started first (it determines the main NN)
ZKFC must be on NN
The JNN position is arbitrary

2, preparation,

Implements no-password login between Node1 and Node2
- Since Node1 and Node2 are NN, they need to be switched when they fail, so SSH no-secret is configured between the two nodes

3. Configure hdFS-site.xml

dfs.nameservices – the logical name for this new nameservice
- Master node service ID (mycluser)
- Provides a unique name that refers to the two Namenode nodes that need to be configured as master slave
- It’s like an entrance
- This name simply represents a team of master and slave (hadoop2.0 can only have two NNS), and if you want to federated, you can separate the different id names with commas

<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>
Copy the code

dfs.ha.namenodes.[nameservice ID] – unique identifiers for each NameNode in the nameservice
- Specifies which NNS are the services to which the id above belongs
- You can see that the following nn1,nn2 is also a logical name
- But you can point to which NN the Nameservices configured above points to
- DFS. Ha. Namenodes. Mycluster last word is the nameservices id above

<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>
Copy the code

Dfs.namenode.rpc-address.[Nameservice ID].[Name node ID] – the fully qualified remote produce call (RPC) address for each NameNode to listen on
- This configuration maps to the real Namenode address

<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>// IP address of the dedicated server<value>node1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>node2:8020</value>
</property>
Copy the code

dfs.namenode.http-address.[nameservice ID].[name node ID] – the fully-qualified HTTP address for each NameNode to listen on
- Provide services to the browser and use the browser to access the Hadoop cluster
- Port is 50070

<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>node1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>node2:50070</value>
</property>
Copy the code

dfs.namenode.shared.edits.dir – the URI which identifies the group of JNs where the NameNodes will write/read edits
- On which servers is a JournalNode deployed and what is its external communication address

<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node2:8485; node3:8485; node4:8485/mycluster</value>
</property>
Copy the code

dfs.client.failover.proxy.provider.[nameservice ID] – the Java class that HDFS clients use to contact the Active NameNode
- What Java proxy class is used for failover

<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
Copy the code

dfs.ha.fencing.methods – a list of scripts or Java classes which will be used to fence the Active NameNode during a failover
- When an NN malfunctions, it should be isolated immediately, otherwise it will cause a split brain. And the other one will immediately become active

// Use SSH for isolation<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/root/.ssh/id_dsa</value>
</property>
Copy the code

fs.defaultFS – the default path prefix used by the Hadoop FS client when none is given
- The configuration is in the core-site.xml file
- Configure the logical path for the client to access HA Hasoop. Use the previous Nameservice ID as the HDFS path

<property>
  <name>fs.defaultFS</name>Do not use mycluser when typing mycluster<value>hdfs://mycluster</value>
</property>// There is only one namenode, so the path of the NN is directly configured, and the service ID is used here<property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:9000</value>
    </property>// Change the directory where NN and DN store data again<property>
        <name>hadoop.tmp.dir</name>
        <value>/var/hadoop/ha</value>
    </property>
Copy the code

dfs.journalnode.edits.dir – the path where the JournalNode daemon will store its local state
- In which directory of the node logs generated by journalNode are stored

<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/var/hadoop/ha/journalnode</value>
</property>
Copy the code

Configure ZooKeeper

The configuration of automatic failover requires the addition of two new parameters to your configuration. In your hdfs-site.xml file, add:

<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>
Copy the code

This specifies that the cluster should be set up for automatic failover. In your core-site.xml file, add:

<property>
   <name>ha.zookeeper.quorum</name>
   <value>node2:2181,node3:2181,node4:2181</value>
 </property>
Copy the code

Five, before the start of preparation

Distribute the modified core-site. XML and HDFS-site. XML to other nodes

Vi. Modify the ZooKeeper configuration file

Open the ZooKeeper software/conf/zoo_sample.cfgdirectory

Mv zoo_sample.cfg zoo.cfg #2. # autopurg. purgeInterval=1 # The number of milliseconds of each tick tickTime=2000 # autopurg. purgeInterval=1 # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # Do not use/TMP for storage, / TMP here is just # example sakes. DataDir =/var/hadoop/zk # The port at which the clients will connect clientPort=2181 # the maximum number of clients  connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge. PurgeInterval =1 // Add the following three lines (how many servers are involved, 1=node2:2888:3888 server.2= Node3 :2888:3888 server.3= Node4 :2888:3888Copy the code

Add a few files
- The zooKeeper configuration file is modified and the path is added/var/hadoop/zk
- Create the file myid under this directory on each node on which the ZooKeeper service is to be configured
```
// On three serversecho 1 >> /var/hadoop/zk/myid
echo 2 >> /var/hadoop/zk/myid
echo 3 >> /var/hadoop/zk/myid
Copy the code
```
Configure the ZooKeeper environment variables

Seven, execute,

Start the zookeeper

Start the commandzkServer.sh start
Role in 2, node3 node4
After startup, the status is as follows

[root@node4 hadoop]# jps
7590 QuorumPeerMain
7607 Jps
Copy the code

You can run commands to view the status of each node

[root@node4 conf]# zkServer.sh statusJMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.6/bin/.. /conf/zoo. CFG // Note The Mode: leader node is the primary node of the ZooKeeper clusterCopy the code

Start the JournalNode (first time to start the cluster, not later)
- Start command:hadoop-daemon.sh start journalnode
- Role in node1, 2, 2
- A new process has been created
```
[root@node1 hadoop]# jps
7912 Jps
7866 JournalNode
Copy the code
```
Format HDFS (when the cluster is started for the first time, do not run this command in the future)
- Randomly select an NN to executehdfs namenode -format
- You only need to run the format command once on one node. If you run the command multiple times, the cluster ID will not be consistent
- So we have two NNS, how do we format both of them?
  - Start the formatted NN firsthadoop-daemon.sh start namenode
```
// This NN has an existing process [root@node1 ~]# jps
6673 Jps
6603 NameNode
6462 JournalNode
Copy the code
```
  - Execute on another NNhdfs namenode -bootstrapStandbyCopy the data of the started NN to the server where this NN resides
HDFS is registered on ZooKeeper. (This command is executed when the cluster is started for the first time.
- The HDFS creates its own node on ZooKeeper
- Using the commandhdfs zkfc -formatZK
- Zookeeper can maintain information about multiple clusters at the same time, so this command is used to format information about the cluster to ZooKeeper
- Zookeeper creates one/hadoop-ha/myclusterThe directory holds all the information about the cluster

Start the cluster

start-dfs.sh
Note that the ZKFC process does not need to be started manually; it starts on its own with the cluster

//node1
[root@node1 ~]# jps
7185 Jps
6603 NameNode
7116 DFSZKFailoverController
6462 JournalNode

//node2
[root@node2 ~]# jps
6945 Jps
6770 DataNode
6899 DFSZKFailoverController
6700 NameNode
6445 QuorumPeerMain
6494 JournalNode

//node3
[root@node3 ~]# jps
6629 DataNode
6492 JournalNode
6718 Jps
6447 QuorumPeerMain

//node4
[root@node4 ~]# jps
6454 QuorumPeerMain
6598 DataNode
6667 Jps
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop Learning 03 — High Availability HA setup

HA HA.

concept

I. limitations of hadoop1.0

Ii. High Availability

3. Federation

HA high availability configuration

I. Installation positions of each service node

2, preparation,

3. Configure hdFS-site.xml

Configure ZooKeeper

Five, before the start of preparation

Vi. Modify the ZooKeeper configuration file

Seven, execute,

Hadoop Learning 03 — High Availability HA setup

HA HA.

concept

I. limitations of hadoop1.0

Ii. High Availability

3. Federation

HA high availability configuration

I. Installation positions of each service node

2, preparation,

3. Configure hdFS-site.xml

Configure ZooKeeper

Five, before the start of preparation

Vi. Modify the ZooKeeper configuration file

Seven, execute,

Related Posts

How is KubeVela different from PaaS?

New Spark open source feature: Catalyst Optimization process tailoring

Client go: Clientset