Preface: Old Liu is at present for next year school recruit but diligently, write an article basically is to want to use plain English the big data knowledge point that oneself review explains in detail come out, refuse to read a book, accomplish have oneself understanding!
01 ZooKeeper Knowledge points
What is ZooKeeper?
In short, Zookeeper is a distributed coordination framework. There is a problem with this coordination framework in kafka0.8. Zookeeper can be used to implement high availability of Hadoop (high availability will be covered later). Kafka offsets can be stored in Zookeeper, but there is a problem, which will be discussed in more detail when we talk about Kafka.
Point 2: Why Zookeeper?
For example, in Hadoop high availability, if you do not use Zookeeper, you will have to write your own program to switch between the two Namenodes, which will lead to distraction. Hadoop itself does storage and so on. Now I need to consider the high availability logic of Hadoop. I’m really too tired. Can someone else share my pressure? So Zookeeper is used to help Hadoop take the load off and keep Hadoop focused.
Point 3: Some command line operations for Zookeeper
Start the ZooKeeper cluster. Sh start # Stop the ZooKeeper cluster (run the following command for each node) zkserver. sh stop # Check the cluster status (run the following command for each node) zkserver.sh Connect ZooKeeper server status # zkCli. Sh - server node01:2181, node02:2181, node03:2181 # to check the ZooKeeper files under the root directory/ls / # to create a node list, Create/Lao Liu # Get/Lao liu # Modify the node data set/Lao Liu1 # delete the node delete/Lao liu #Copy the code
Point 4: JAVA API in IDEA, this must remember, must not ignore!
This does not share, too much, if you want, contact Lao Liu directly, Lao Liu gives you is.
Point 5: Components of ZooKeeper
It has three parts: a file system-like directory node tree to store data, some basic commands to operate ZooKeeper, and its listener, Watcher.
What is a file system-like directory node tree?
What is a listener?
Listener has three parts: (1) register: client registers listener with ZooKeeper cluster. (2) Monitor event: listener is responsible for listening some events. (3) callback function: callback function will be called when listener listens to the occurrence of events.
Here is an example, is seen in some institutions, feel more classic, let me show you.
1 I checked in at the hotel, but I was told there was no room available. Did help me pay attention to the empty room, if you have, just inform me (equivalent to registered listeners, listening some events) 3, until almost 12 o 'clock check-out tenant, have free room (found to monitor events) 4 customer service discovery have any vacancies (listen to the event) may inform this elder brothers this elder brothers after receiving the notice, Get back to your hotel (equivalent to calling the callback function)Copy the code
Point 6: Data node ZNode in ZooKeeper
Znodes are classified into four types: Persistent node temporary node Ordered node non-ordered node
I’m going to focus on ordered nodes
The persistent node is equivalent to the directory in the file directory system, while the temporary node is bound to the session session. When the client is connected to the ZooKeeper cluster, it is equivalent to establishing a session. When the connection is disconnected, the session is invalid, and the temporary node is also invalid and will be deleted.
So why do we have ordered nodes?
This is to prevent multiple clients from creating a ZNode with the same name in the same directory. If the same name is used, an error will be reported, so there is an ordered node. When creating an ordered node, ZooKeeper automatically adds an integer increment to the end of the node.
What is a session?
When a client establishes a TCP long connection with a node in the ZooKeeper cluster, it is called a session. In a session, requests are executed in FIFO (first in, first out) order. For example, when a client issues a CREATE request and then a GET request, create and then GET are executed. However, if there are two sessions, there is no way to guarantee the FIFO between the two sessions, it can only guarantee the FIFO in a session, the old Liu does not know the reason, just write down, ha ha ha!
Point 8: What is a transaction?
In ZooKeeper, Lao Liu hasn’t understood clearly, so I won’t write it here. You can search for it by yourself. When Lao Liu suddenly thinks of it, he can make up for it.
9. What is a Watcher?
First of all, after the client establishes a connection with ZooKeeper, how does the client obtain the latest data in ZooKeeper and how does it perceive the data changes? That is, through the introduction of Watcher, Lao Liu gave an example of Watcher in the fifth point, which Lao Liu thought was very classic. Watcher is a listener registered by the client on the ZooKeeper server. It monitors data changes on the ZNode and then tells the client the latest information.
Talk about the ZooKeeper cluster architecture
First, a ZooKeeper cluster is a master-slave architecture. There are three roles in a ZooKeeper cluster: leader, follower, and observer. So if you know these three things, you have to understand their concepts.
The leader provides read and write services for clients and maintains cluster status.
Followers provide read and write services for the client, report their status information to the leader, and participate in the “write success” and the leader election.
An observer is a special follower that provides read and write services for the client and reports its own status information to the leader, but does not participate in the “write success” or the leader election.
How does the client read to the ZooKeeper cluster?
When you look at this figure, you can see that the ZK read operation is very simple. The client first establishes a session with a server in the ZK cluster, then reads data directly from the ZK server, returns to the client after reading, and finally closes the session. It’s that simple!
How does the client write to the ZK cluster?
Does anyone think it will be very simple? Lao Liu thought so at the beginning, but after reading it, he found that he was too superficial. ZK’s write operation is much more complicated than ZK’s read operation.
First share a particularly good example that I saw, this example in Lao Liu’s opinion is really very vivid expression of the PROCESS of ZK write operation.
1, there is a rich man came to the bank, said to a counter little sister I was here yesterday to deposit money, you gave me less than 10 million, now need you to add to me.
2, that such a large amount of money, the counter little sister is certainly not authorized to operate, she will report to the manager, the manager can not casually add, he in order to let their own operation, he will seek the opinion of all his subordinates.
If most people agree, the manager will make a decision and agree. And all subordinates will be told to write it down.
4, so the beginning of the counter little sister will inform the rich successful operation, plus 10 million.
So if you look at this example, if you look at ZK write, you’ll see that it’s almost exactly the same.
(1) The client writes data to the ZK cluster, for example, create /test. Establishes a session with the leftmost follower in the cluster.
② The follower forwards the write request to the leader.
③ After the leader receives the message, he sends out a proposal to create /test. Then he tells each follower to remember to create /test first.
(4) Now let’s vote on whether to allow create /test. If more than ** quorum ** in the cluster agrees, including the leader himself, as described below, the leader commits the proposal, and the Leader creates the ZNode node /test locally.
⑤ Then the leader will notify all followers and commit the proposal to create ZNode nodes/tests locally.
⑥ After all the above operations are completed, the leftmost follower responds to the client.
How, how, is it very similar to the example given, ha ha ha!
Point 11: Leader election in ZK cluster
Leader election is divided into two kinds of election, one is brand new election and the other is non-brand election. Here Lao Liu will talk about the new election carefully, the non-new election and the new election are roughly the same, you can go to search.
In the leader election, it is a very important principle that the leader can be elected only after more than half of the quorum servers have been started. How do we calculate this half? For example, in a 3-machine ZK cluster, half equals 3/2+1=2, which is the number of servers in the cluster divided by 2 plus 1.
In the new leader election, the initial voting information of each server is server1-(1, 0), Server2 -(2, 0), server3-(3, 0).
Then how to judge the election of leader? If server1 votes (SID1, zxid1) and server2 votes (SID2, zxid2), the leader will be compared. If the zxids are equal, the SIDs are compared to see who is the leader.
The above mentioned knowledge points of leader election are basically finished. Next, let Lao Liu explain the steps of leader election in detail, or talk about this election in the ZK cluster composed of 3 machines.
Assuming that ZK1, ZK2, and ZK3 are activated in the same order, half of them will be two.
1, after ZK1 is started, you can vote for yourself, vote information (1,0), without a majority, you can not vote.
2, restart ZK2; ZK1 and ZK2 vote for themselves and other servers; The vote for ZK1 is (1, 0), and the vote for ZK2 is (2, 0).
3, Now that the number of clusters has reached 2, the election can be carried out, first start processing voting. ZK1 will compare the votes cast for it (1,0) with the votes sent by ZK2 (2,0); Use the leader election formula, because zxids are 0, equal; So determine the maximum sid; 2 > 1; So ZK1 updates its vote to (2, 0). In the same way, ZK2 performs the same logic, ZK2 updates its own vote to (2,0).
4. After the vote is processed, the vote will be held again. Now the votes on both ZK1 and ZK2 are (2,0), ZK2 will be elected leader, and then the server state will be changed to ZK2 as leader. Change the ZK1 state to Follower.
5. Finally, when K3 starts up, it finds that the cluster already has a Leader, so it does not vote and directly changes to follower.
Point 12: Summary of arbitration Qurorum knowledge points
Old Liu say 1 in this first, ZAB algorithm, old Liu speak again later, haven’t figured it out now.
What is arbitration?
When a proposal is proposed, it can become effective as long as the majority of the parties agree to it.
Why arbitration?
Even if all servers do not need to respond, the proposal can take effect, which improves the response speed of the cluster and is reasonable.
How is quorum number selected?
In a 3-machine ZK cluster, half is 3/2+1=2, which is the number of servers in the cluster divided by 2, plus 1.
Point 13: How ZooKeeper works
Read and write operations were covered in point 10, so let’s start with ZooKeeper state synchronization. After the leader election, ZK enters state synchronization between ZooKeeper.
So how exactly does state synchronization work? Let old Liu remember the thing in the brain to write out first, not urgent!
1. The leader builds a NEWLEADER packet containing the leader’s maximum ZXID, which is then broadcast to other followers.
2. After receiving the data, the followers compare it with the leader’s maximum ZXID. If their maximum ZXID is smaller than that of the leader, their data is not up to date and needs to be synchronized with the leader’s status. Otherwise not necessary.
3. If synchronization is required, the leader creates a LearnerHandler thread for each follower that needs to be synchronized. This thread takes care of the requests for data synchronization.
4. The leader main thread waits for the LearnHandler thread to finish processing the result. The leader starts to respond to write requests only when most of the followers have completed synchronization.
This is the general state synchronization process, but the LearnerHandler thread is only briefly described in Step 4. The next step is to describe the flow in the LearnerHandler thread.
① The FOLLOWERINFO packet will be received first, including the maximum zxID of the follower.
(2) If the maximum zxID of the follower is the same as the maximum ZXID of the leader, the current follower is the latest.
③ During the judgment period, we should also judge whether there is any new proposal submitted. If so, DIFF packets are sent to synchronize the differential data. At the same time, send a COMMIT packet to the followers one by one to save the data that is not in the followers. If no, but the follower data ID is larger, a TRUNC packet is sent to truncate redundant data. If the maximum ZXID of the follower is smaller than the maximum ZXID of the leader, a SNAP packet is sent to synchronously send snapshots to the follower.
(4) After the above messages are complete, a UPTODATE packet is sent to inform followers that the current data is the latest.
HDFS HA for ZooKeeper instances
So many ZooKeeper principles, now let’s talk about ZooKeeper examples, in case the interviewer asks ZooKeeper related examples and he is not prepared. This time, I will talk about HDFS HA. HDFS HA mainly relies on ZooKeeper to achieve high availability. It mainly consists of two parts: metadata synchronization and active/standby switchover.
Let’s talk about metadata synchronization, mainly in the red circle below.
The following describes the metadata synchronization process. Run two NameNode nodes in the same HDFS cluster. The primary Namenode is in Active state, and the secondary Namenode is in Standby state. Only the Active NameNode can provide read and write services. The Standby NameNode will switch to the Active state when the Active node is abnormal according to the status change of the Active NameNode.
However, during the Active/standby switchover, the new Active NameNode can provide services only after the metadata is synchronized with the original Active NamNode.
So how do you do metadata synchronization?
JournalNode cluster is used as the shared storage system. When the client performs operations on HDFS, it will record the logs in the edits.log file in the Active NameNode, and the logs will also be written to the JournalNode cluster. It is responsible for storing metadata newly generated by HDFS. When new data is written to the JournalNode cluster, the Standby NameNode listens and synchronizes the new data. The Active NameNode and Standby NameNode achieve metadata synchronization. In addition, all Datanodes also make block reports to the two active and standby Namenodes.
Now it’s time for the active/standby switchover. Talk about it! First draw the flow chart:
Follow this flow chart to get a feel for the process.
1. Each NameNode node has a ZKFC process, which is responsible for controlling the active/standby switchover of NameNode.
2. When ZKFC is started, it will initialize HealthMonitor and ActiveStandbyElector services at the same time. ZKFC will register corresponding callback methods with HealthMonitor and ActiveStandbyElector at the same time. HealthMonitor monitors the health status of NameNode and ActiveStandbyElector receives the election request from ZKFC and creates a temporary node ActiveStandbyElectorLock.
3. Next, the two ZKFC will try to create a temporary node ActiveStandbyElectorLock in Zookeeper through their respective ActiveStandbyElector, but due to the write consistency of Zookeeper, This will result in only one ActiveStandbyElector being created successfully.
Create ActiveStandbyElector callback ZKFC callback method, change the corresponding NameNode state to Active NameNode, and create ActiveStandbyElector callback ZKFC callback method, Example Switch the corresponding NameNode to Standby NameNode.
5. But! Regardless of whether the election is successful or not, all ActivestandByElectors register a Watcher listener on the temporary node ActiveStandbyElectorLock to listen for state changes on that node.
6. If HealthMonitor corresponding to Active NameNode detects that the NameNode status is abnormal, it will notify corresponding ZKFC. ZKFC calls the ActiveStandbyElector method to delete the temporary node ActiveStandbyElectorLock created on Zookeeper. At this point, the Watcher registered by ActiveStandbyElector of Standby NameNode listens for the deletion event.
ActiveStandbyElector creates a temporary node ActiveStandbyElectorLock. If it succeeds, the Standby NameNode is elected as the Active NameNode.
Point 15: Split brain
What is split brain?
The phenomenon of having two leaders in a distributed system is called split brain. There are many reasons for this, such as network delay. This is a terrible situation and must be avoided through the Fencing mechanism.
So how does isolation work?
ActiveStandbyElector successfully creates an ActiveStandbyElectorLock temporary node, and creates another ActiveBreadCrumb that holds the Active NameNode address information.
2. When the Active NameNode disconnects the Session, the ActiveBreadCrumb of the Active NameNode and the ActiveBreadCrumb of the Active NameNode are removed.
3. However, if ActiveStandbyElector closes the Session in an abnormal state, the ActiveBreadCrumb of the persistent node remains.
4. When another NameNode is about to change from Standy to Active, the active BreadCrumb node from the previous active NameNode is found, and ZKFC is called back to make fencing for the old active NameNode.
02 summary
Well, it’s over at last! Lao Liu summed up a total of 15 points about big data ZooKeeper knowledge points, every point is very important in Lao Liu’s opinion, Lao Liu memorized them and kept them in mind, hoping to be helpful to you who want to learn big data, but also hope to get the big boss’s criticism and advice.