This is the 11th day of my participation in the More text Challenge. For more details, see more text Challenge

  • The official documentation

HA HA.

concept

I. limitations of hadoop1.0

  1. The problem of the namenode
    • Single point of failure: Only one Namenode
    • Single point bottleneck: A Namenode that may not have enough memory to manage all Datanodes

Ii. High Availability

  • It is used to resolve single points of failure
  • Hadoop2.0 supports HA for only two nodes, and 3.0 supports one master and multiple slave nodes
  • If the master fails, it goes to stand by.
  1. HA architecture

  • The high availability (HA) of the HDFS is mainly reflected in that ZooKeeper is used to implement active and standby Namenodes to resolve single point NameNode faults.
  • ZooKeeper stores HA status files and active/standby information. The number of ZK is recommended to be at least three and an odd number

a

  • NameNode Active/standby mode. The active node provides services, and the standby node synchronizes metadata from the active node and acts as active hot standby.
  • ZooKeeper Failover Controller (ZKFC) Is used to monitor the active/standby status of NameNode nodes.
  • JN(JournalNode) is used to store editLogs generated by Active NameNode. The Standby NameNode loads the Editlog on JN and synchronizes metadata.
  • ZKFC controls the NameNode active/standby arbitration
    • ZKFC, as a compact arbitration agent, uses the distributed lock function of ZooKeeper to implement active/standby arbitration, and then controls the active/standby status of NameNode through command channels. ZKFC and NN are deployed together in the same number.
  • Metadata synchronization
  1. Data synchronization between two NNS: two Namenodes are not working at the same time, only one NN is working at the same time
    1. The two NNS must synchronize data information
      • Block imformation, which is processed by datanodes and reports ———— dynamic data information to the NN
      • Offset, size, ID, these are all handled by the NN itself ———— static information
    2. Synchronization of dynamic information
      • DN reports to NN
      • The DN reporting to a single NN is now reporting to multiple NNS
    3. Synchronization of static information
      • Since the static data information is handled by the NN itself, there are two NNS. How do you synchronize the information between the two NNS?
        • Use a JournalNode cluster
      • By deploying multiple JournalNode nodes on different servers, each of which receives the same information, multiple nodes prevent a single point of failure
      • The primary NN writes data to the JournalNode and the backup NN reads data from the JournalNode
      • Half mechanism (weak consistency) : Allows less than half of a JournalNode node to fail
        • The primary NN does not need all JournalNodes to confirm that the write is complete, allowing a small number of failures
        • Generally, an odd number of JournalNode nodes are configured
        • For example, 3 allow 1 invalid, 5 allow two invalid
  2. Metadata persistence
    • The active NameNode provides external services. The generated Editlog writes both locally and to JN, while updating metadata in the memory of the primary NameNode.
    • When the standby NameNode monitors changes to the Editlog on JN, it loads the Editlog into memory and generates the same new metadata as the primary NameNode. Metadata synchronization is complete.
    • The active and standby Fsimages are still stored on their disks and do not interact with each other. FSImage is a copy of metadata periodically written to the local disk. It is also called metadata mirroring

  • EditLog: Records user operation logs to generate new file system mirrors based on FSImage.
  • FSImage: periodically saves file images.
  • CKPT: Merge the FSImage and EditLog files in memory to create a new FSImage and write it to disk. This process is called checkpoint. After loading the fsimage and EditLog files, the standby NameNode writes the merged results to the local disk and the NFS at the same time. In this case, the disk contains the original fsimage file and the newly generated checkpoint file: fsimage.ckpt. Then change the name of fsimage.ckpt to fsimage (overwrite the original fsimage).
  • EditLog.new: The NameNode triggers the merge every one hour or the Editlog reaches 64MB. During the merge, the NameNode generates a new log file, editlog. new, to store the operation logs of the Standby NameNode because the data cannot be synchronized. After the Standby NameNode is merged into a fsimage, it is sent back to the active NameNode to replace the original fsimage and name editlog. new as Editlog.
  1. Zookeeper cluster
    • When the primary NN fails, it is used to switch NN automatically
    • Zookeeper deploys the ZKFC process on the NN, performs election and health checks, and notifies stand by NN if the NN fails.
    • Once the main NN is down, it immediately switches to Stand BY, and the other NN automatically switches to active

3. Federation

  • Resolve single point bottlenecks
  1. architecture

  • Possible cause: The single Active NN architecture causes potential problems in the scalability and performance of THE HDFS cluster. When the cluster is large enough, the memory used by NN processes may reach hundreds of GIGABytes, and NN becomes a performance bottleneck.
  • Application scenario: Large – scale file storage. For example, Internet companies store user behavior data, telecom historical data, voice data and other large-scale data storage. At this point, NameNode memory is insufficient to support such a large cluster.
  • The common formula for estimating 1 gigabyte for 1 million blocks is 128T, which is the default block size. (This is a relatively rich estimate, in fact, even if there was only one block per file, all metadata information would not be 1KB per block.)
  • Federation: Each NameNode is responsible for its own directory. Similar to mounting disks to directories in Linux, each NameNode is only responsible for some directories in the HDFS cluster. If NameNode1 is responsible for the /database directory, NameNode1 is responsible for the file metadata in the /database directory. Metadata is not shared among Namenodes. Each NameNode has a standby standby.
  • Block pool: a group of file blocks belonging to a namespace (NS).
  • In a federated environment, each Namenode maintains a namespace volume, which contains metadata of the namespace and block pools for all data blocks of files in the namespace.
  • Namenodes are independent and do not communicate with each other. Failure of one namenode does not affect other Namenodes.
  • Datanodes register with all Namenodes in the cluster and store data for all block pools in the cluster.
  • AmeSpace (NS) : indicates the namespace. The HDFS namespace contains directories, files, and blocks. It can be understood as the logical directory where NameNode belongs.

HA high availability configuration

  • Official Setup document

I. Installation positions of each service node

  • ZK is in any position and must be started first (it determines the main NN)
  • ZKFC must be on NN
  • The JNN position is arbitrary

2, preparation,

  1. Implements no-password login between Node1 and Node2
    • Since Node1 and Node2 are NN, they need to be switched when they fail, so SSH no-secret is configured between the two nodes

3. Configure hdFS-site.xml

  1. dfs.nameservices – the logical name for this new nameservice
    • Master node service ID (mycluser)
    • Provides a unique name that refers to the two Namenode nodes that need to be configured as master slave
    • It’s like an entrance
    • This name simply represents a team of master and slave (hadoop2.0 can only have two NNS), and if you want to federated, you can separate the different id names with commas
<property>
  <name>dfs.nameservices</name>
  <value>mycluster</value>
</property>
Copy the code
  1. dfs.ha.namenodes.[nameservice ID] – unique identifiers for each NameNode in the nameservice
    • Specifies which NNS are the services to which the id above belongs
    • You can see that the following nn1,nn2 is also a logical name
    • But you can point to which NN the Nameservices configured above points to
    • DFS. Ha. Namenodes. Mycluster last word is the nameservices id above
<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>nn1,nn2</value>
</property>
Copy the code
  1. Dfs.namenode.rpc-address.[Nameservice ID].[Name node ID] – the fully qualified remote produce call (RPC) address for each NameNode to listen on
    • This configuration maps to the real Namenode address
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn1</name>// IP address of the dedicated server<value>node1:8020</value>
</property>
<property>
  <name>dfs.namenode.rpc-address.mycluster.nn2</name>
  <value>node2:8020</value>
</property>
Copy the code
  1. dfs.namenode.http-address.[nameservice ID].[name node ID] – the fully-qualified HTTP address for each NameNode to listen on
    • Provide services to the browser and use the browser to access the Hadoop cluster
    • Port is 50070
<property>
  <name>dfs.namenode.http-address.mycluster.nn1</name>
  <value>node1:50070</value>
</property>
<property>
  <name>dfs.namenode.http-address.mycluster.nn2</name>
  <value>node2:50070</value>
</property>
Copy the code
  1. dfs.namenode.shared.edits.dir – the URI which identifies the group of JNs where the NameNodes will write/read edits
    • On which servers is a JournalNode deployed and what is its external communication address
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://node2:8485; node3:8485; node4:8485/mycluster</value>
</property>
Copy the code
  1. dfs.client.failover.proxy.provider.[nameservice ID] – the Java class that HDFS clients use to contact the Active NameNode
    • What Java proxy class is used for failover
<property>
  <name>dfs.client.failover.proxy.provider.mycluster</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
Copy the code
  1. dfs.ha.fencing.methods – a list of scripts or Java classes which will be used to fence the Active NameNode during a failover
    • When an NN malfunctions, it should be isolated immediately, otherwise it will cause a split brain. And the other one will immediately become active
// Use SSH for isolation<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>

<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/root/.ssh/id_dsa</value>
</property>
Copy the code
  1. fs.defaultFS – the default path prefix used by the Hadoop FS client when none is given
    • The configuration is in the core-site.xml file
    • Configure the logical path for the client to access HA Hasoop. Use the previous Nameservice ID as the HDFS path
<property>
  <name>fs.defaultFS</name>Do not use mycluser when typing mycluster<value>hdfs://mycluster</value>
</property>// There is only one namenode, so the path of the NN is directly configured, and the service ID is used here<property>
        <name>fs.defaultFS</name>
        <value>hdfs://node1:9000</value>
    </property>// Change the directory where NN and DN store data again<property>
        <name>hadoop.tmp.dir</name>
        <value>/var/hadoop/ha</value>
    </property>
Copy the code
  1. dfs.journalnode.edits.dir – the path where the JournalNode daemon will store its local state
    • In which directory of the node logs generated by journalNode are stored
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/var/hadoop/ha/journalnode</value>
</property>
Copy the code

Configure ZooKeeper

  1. The configuration of automatic failover requires the addition of two new parameters to your configuration. In your hdfs-site.xml file, add:
<property>
   <name>dfs.ha.automatic-failover.enabled</name>
   <value>true</value>
 </property>
Copy the code
  1. This specifies that the cluster should be set up for automatic failover. In your core-site.xml file, add:
<property>
   <name>ha.zookeeper.quorum</name>
   <value>node2:2181,node3:2181,node4:2181</value>
 </property>
Copy the code

Five, before the start of preparation

  1. Distribute the modified core-site. XML and HDFS-site. XML to other nodes

Vi. Modify the ZooKeeper configuration file

  1. Open the ZooKeeper software/conf/zoo_sample.cfgdirectory
Mv zoo_sample.cfg zoo.cfg #2. # autopurg. purgeInterval=1 # The number of milliseconds of each tick tickTime=2000 # autopurg. purgeInterval=1 # The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # Do not use/TMP for storage, / TMP here is just # example sakes. DataDir =/var/hadoop/zk # The port at which the clients will connect clientPort=2181 # the maximum number of clients  connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge. PurgeInterval =1 // Add the following three lines (how many servers are involved, 1=node2:2888:3888 server.2= Node3 :2888:3888 server.3= Node4 :2888:3888Copy the code
  1. Add a few files
    • The zooKeeper configuration file is modified and the path is added/var/hadoop/zk
    • Create the file myid under this directory on each node on which the ZooKeeper service is to be configured
    // On three serversecho 1 >> /var/hadoop/zk/myid
    echo 2 >> /var/hadoop/zk/myid
    echo 3 >> /var/hadoop/zk/myid
    Copy the code
  2. Configure the ZooKeeper environment variables

Seven, execute,

  1. Start the zookeeper
    • Start the commandzkServer.sh start
    • Role in 2, node3 node4
    • After startup, the status is as follows
    [root@node4 hadoop]# jps
    7590 QuorumPeerMain
    7607 Jps
    Copy the code
    • You can run commands to view the status of each node
    [root@node4 conf]# zkServer.sh statusJMX enabled by default Using config: /opt/hadoop/zookeeper-3.4.6/bin/.. /conf/zoo. CFG // Note The Mode: leader node is the primary node of the ZooKeeper clusterCopy the code
  2. Start the JournalNode (first time to start the cluster, not later)
    • Start command:hadoop-daemon.sh start journalnode
    • Role in node1, 2, 2
    • A new process has been created
    [root@node1 hadoop]# jps
    7912 Jps
    7866 JournalNode
    Copy the code
  3. Format HDFS (when the cluster is started for the first time, do not run this command in the future)
    • Randomly select an NN to executehdfs namenode -format
    • You only need to run the format command once on one node. If you run the command multiple times, the cluster ID will not be consistent
    • So we have two NNS, how do we format both of them?
      • Start the formatted NN firsthadoop-daemon.sh start namenode
      // This NN has an existing process [root@node1 ~]# jps
      6673 Jps
      6603 NameNode
      6462 JournalNode
      Copy the code
      • Execute on another NNhdfs namenode -bootstrapStandbyCopy the data of the started NN to the server where this NN resides
  4. HDFS is registered on ZooKeeper. (This command is executed when the cluster is started for the first time.
    • The HDFS creates its own node on ZooKeeper
    • Using the commandhdfs zkfc -formatZK
    • Zookeeper can maintain information about multiple clusters at the same time, so this command is used to format information about the cluster to ZooKeeper
    • Zookeeper creates one/hadoop-ha/myclusterThe directory holds all the information about the cluster
  5. Start the cluster
    • start-dfs.sh
    • Note that the ZKFC process does not need to be started manually; it starts on its own with the cluster
    //node1
    [root@node1 ~]# jps
    7185 Jps
    6603 NameNode
    7116 DFSZKFailoverController
    6462 JournalNode
    
    //node2
    [root@node2 ~]# jps
    6945 Jps
    6770 DataNode
    6899 DFSZKFailoverController
    6700 NameNode
    6445 QuorumPeerMain
    6494 JournalNode
    
    //node3
    [root@node3 ~]# jps
    6629 DataNode
    6492 JournalNode
    6718 Jps
    6447 QuorumPeerMain
    
    //node4
    [root@node4 ~]# jps
    6454 QuorumPeerMain
    6598 DataNode
    6667 Jps
    Copy the code