preface
Note that there needs to be a basis for RPC. If you don’t know about RPC, you can jump to the two RPC articles written in the past.
Theory: High concurrency from zero (7) – introduction, protocol and framework of RPC
Code: High concurrency from scratch (8) – simple implementation of RPC framework
Of course, you don’t need to go too deep, just know the surface. Because there is a Hadoop RPC in Hadoop that requires some basic knowledge.
For the moment, remember the following conditions that satisfy RPC (incomplete) :
1. Method calls between different processes
2.RPC is divided into server and client. The client invokes the methods of the server, and the methods are executed on the server
3. A protocol is an interface, but the interface must have a versionID
4. There are abstract methods in the protocol, which are implemented by the server
Okay, let’s get started
NameNode startup process parsing
Our first task is to verify that NameNode is a RPC server
1.1 Step 1: Find the main method of NameNode
Hadoop2.7.0: Namenode.java: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0: Hadoop2.7.0
Let me make some more notes for you to see
1.2 Step 2: Click on the createNameNode method that creates NameNode
The createNameNode method is divided into two parts
The first part: the part of sending the reference
The following commands are used to operate the HDFS cluster
hdfs namenode -format
hadoop=daemon.sh start namanode
Copy the code
Part two: Switch part
The code is too long to take a screenshot, just copy it
switch (startOpt) { case FORMAT: { boolean aborted = format(conf, startOpt.getForceFormat(), startOpt.getInteractiveFormat()); terminate(aborted ? 1:0); return null; // avoid javac warning } case GENCLUSTERID: { System.err.println("Generating new cluster id:"); System.out.println(NNStorage.newClusterID()); terminate(0); return null; } case FINALIZE: { System.err.println("Use of the argument '" + StartupOption.FINALIZE + "' is no longer supported. To finalize an upgrade, start the NN " + " and then run `hdfs dfsadmin -finalizeUpgrade'"); terminate(1); return null; // avoid javac warning } case ROLLBACK: { boolean aborted = doRollback(conf, true); terminate(aborted ? 1:0); return null; // avoid warning } case BOOTSTRAPSTANDBY: { String toolArgs[] = Arrays.copyOfRange(argv, 1, argv.length); int rc = BootstrapStandby.run(toolArgs, conf); terminate(rc); return null; // avoid warning } case INITIALIZESHAREDEDITS: { boolean aborted = initializeSharedEdits(conf, startOpt.getForceFormat(), startOpt.getInteractiveFormat()); terminate(aborted ? 1:0); return null; // avoid warning } case BACKUP: case CHECKPOINT: { NamenodeRole role = startOpt.toNodeRole(); DefaultMetricsSystem.initialize(role.toString().replace(" ", "")); return new BackupNode(conf, role); } case RECOVER: { NameNode.doRecovery(startOpt, conf); return null; } case METADATAVERSION: { printMetadataVersion(conf); terminate(0); return null; // avoid javac warning } case UPGRADEONLY: { DefaultMetricsSystem.initialize("NameNode"); new NameNode(conf); terminate(0); return null; } default: { DefaultMetricsSystem.initialize("NameNode"); return new NameNode(conf); }Copy the code
For example, the first bit is HDFS namenode-format, which is a format
The natural path is this branch.
Hadoop =daemon.sh start namanode; hadoop=daemon.sh start namanode; hadoop=daemon.sh start namanode
I’m gonna go ahead and click on this new NameNode(conf)
1.3 Initialization of NameNode
I’m going to go ahead and click on this method, which is actually the one down here
Initialize = initialize; initialize = initialize; initialize = initialize
1.4 Initialization procedure
1.4.1 HttpServer
The previous one has judged some strange conditions and let’s ignore them for a moment and look directly at our sensitive position
At this point we see the code to create HttpServer, we if it is the first time to build our big data cluster, will not visit a 50070 Web page, such as
This is not an important process, but by the way, how did 50070 come from
Then we see the two parameters, DFS_NAMENODE_HTTP_ADDRESS_KEY and DFS_NAMENODE_HTTP_ADDRESS_DEFAULT
DFS_NAMENODE_HTTP_ADDRESS_KEY is a manually configured address, but hadoop uses the default address DFS_NAMENODE_HTTP_ADDRESS_DEFAULT
If you look at this, you can see why we visited that website with the local IP and a 50070.
1.4.2 Servlet enhancements to HttpServer2
Now let’s go back to the first diagram, the second sentence which is the startHttpServer method of start, and click on the start method
As we move on, as Java Developers, our focus is naturally on the familiar servlets
setupServlets(httpServer, conf);
Copy the code
There’s a bunch of servlets tied together, and the more they do, the better they get. Let’s click on that
We’re going to add a bunch of servlets, but we’re not going to look at servlets, and we’re going to go to doGet() and we’re going to see what they do
So we’ve seen so far that we can draw our flow chart
1.5 The first step of the flowchart
First of all, we now have an HttpServer2 (Hadoop’s own encapsulation of HttpServer) that is bundled with servlets providing a variety of functions
It provides a port of 50070, and the browser will return that Web page after sending a request to http://ip Address :50070/listPaths to the servlet
1.6 the Hadoop RPC
Return to the HttpServer page in 1.4.1. When HttpServer is started, metadata is loaded, but the scenario we are simulating is the first time the cluster is started. There is no metadata when the cluster is first started, so let’s skip this step and go to the Hadoop RPC location
How did I know that the Hadoop RPC would have two such services? This is not my guess, and it is not something I have looked up or clicked on, but the NameNode source code tells us
Let’s directly put this paragraph in English on Baidu Translation
NameNode is not just a class. It is also a Server that exposes an IPC Server and an Http Server to the outside world. The Http Server is the one that opened up the 50070 interface to let developers know about HDFS. The FSNamessystem class manages HDFS metadata
Now we know that one of these two services is provided to the internal DataNode to call, and the other is provided to the server to call
1.6.1 Verifying the implementation protocol and the versionID exists in the protocol
The method name here is pretty straightforward, createRpcServer() hahaha, let’s click in
It returns a NameNodeRpcServer. Is that the NameNode server we are looking for?
The protocol is simply an interface, but the interface must have a versionID. Ok, remember that, and click NameNodeRpcServer
I believe this is what we want to see, it does implement a NamenodeProtocols, feels like we’re about to see the truth, right? Click in
I go, inherit so many protocols, no wonder our NameNode function is so powerful, this must have how many methods ah, this time we randomly click into an interface to see
1.6.2 Verifying that there is a procedure for setting parameters
It is an interface and has a versionID. Now we think it is the server, but we haven’t seen the code of the set server address, port, etc., so we still can’t confirm it. So let’s now go back inside the Class NameNodeRpcServer and pull down to line 296
Pull down to line 343, and there’s another similar passage
Remember that there are two services, one for internal Datanodes to call, and the other for the server to call, so the first serviceRpcServer is for NameNode and DataNode calls, The second clientRpcServer is used to interact with NameNode and DataNode
After they are created, many protocols are added with many methods, and the more protocols added, the more powerful the two services become
So they follow the same path as HttpServer, which augments itself by adding protocols while HttpServer adds servlets.
1.7 Step 2 of the flow chart
Picture the structure of NameNodeRpcServer, where the main body is serviceRpcServer and clientRpcServer, and they provide a variety of service methods
Clients operate NameNode (for example, create directory mkdirs) using clientRpcServer. Datanodes and Namenodes invoke each other using serviceRpcServer
The namenode in the figure can be regarded as the standBy Namenode. The active namenode and standBy namenode mentioned in HA can be reviewed
1.8 Check before official startup
The safe mode is covered in the heartbeat mechanism in HDFS 1, which is copied directly here
When the Hadoop cluster starts up, it will enter safe mode (99.99%), and the heartbeat mechanism will be used. In fact, when the cluster starts up, each DataNode will send blockReport to NameNode, and NameNode will count the total number of blocks reported by them. When block/total is less than 99.99%, the safe mode is triggered. In this mode, the client cannot write data to the HDFS, but can only read data.
Point into startCommonServices
1.8.1 NameNode Resource Check 1: Obtain the directory to be checked
NameNodeResourceChecker is a resource inspector for NameNode
duReserved
Here is a duReserved value that we can set ourselves, and if not, use the default value that it gives us. We can also look at the default DFS_NAMENODE_DU_RESERVED_DEFAULT, which is 100M and is defined in dfsConfigkeys.java
I might also add that the directory that we need to check for alarms is in getNamespaceEditsDirs(conf), and you can go all the way there and see what we just mentioned, The three directories that need to be checked (NameNode fsimage and Edit log directory and JournalNode directory, all of which are defined in dfsConfigkeys.java)
LocalEditDirs is not a directory in HDFS, but a directory on a Linux disk. After iterating this localEditDirs, add it to a volume
You can see a lot of volumes in HDFS. Volumes are a collection of directories you need to check. The first sentence “Add the volume of the passed-in directory to the list of volumes to check.” means to Add the volume of the passed directory to the list of volumes to check. Volumes.
So let’s go back to startCommonServices, now that we know
NnResourceChecker = new NameNodeResourceChecker(conf);Copy the code
1.8.2 NameNode Resource Check 2: Check whether the disk space is sufficient to store metadata
CheckAvailableResources () click in to have a look
Hadoop’s methods are often named so bluntly that hasAvailableDiskSpace literally means available space, and then click in
If you want to add volumes to the directory you want to check, you need to add volumes to the directory you want to check. Click on the areResourcesAvailable method. If volumes are a collection, the logic must have a for loop that iterates through the values of volumes
Sure enough, the for loop appears. In this case, isResourceAvailable is used
Here we see that it gets the size of the current directory (JDK) and compares it to the duReserved (default 100M) we talked about earlier. At this time we put the previous knowledge points together.
If space is insufficient, it will print a paragraph, this passage in our daily company cluster environment is unlikely to see, but if we are to build a cluster to learn when you will see, this time is our virtual machine space is insufficient, while the cluster service normal boot, but the cluster can’t work normally.
1.8.3 NameNode Resource Check 3: Security mode check
Interview question: Do we really know why hadoop clusters go into safe mode?
Back to FSNameSystem, click on the setBlockTotal() method
Click in and see how does it get the normal number of blocks
So what does it mean to get a block that you’re building, and click on that
Block has two states in HDFS, one is complete, which is a complete block that can be used, and the other is UnderConstruct, which is under construction and cannot be used normally, corresponding to the code in the following
So let’s go back to getCompleteBlocksTotal and use blockTotal -numUCBlocks to get the number of blocks ready to complete
Go back to setBlockTotal safemode.setBlockTotal ((int)getCompleteBlocksTotal()) and click setBlockTotal
The default value of threshold is 0.999. For example, if the total number of blocks is 1000, if 999 isComplete blocks exist, the cluster is considered healthy and can exit the safe mode.
CheckMode () and you can see that needEnter() is the condition for entering safe mode.
Any one of the following three conditions will go into safe mode
The first condition
threshold ! = 0 = 0 = 0 = 0 = 0 = 0
When the cluster starts, Datanodes also start after NameNode starts. When Datanodes start, they report their block status information to NameNode. For each block reported, blockSafe increases by one. BlockThreshold Number of blocks that meet the threshold conditions of the secure mode. When the reported quantity is less than the required quantity, the secure mode starts.
Second condition
datanodeThreshold ! = 0 && getNumLiveDataNodes() < datanodeThreshold
Datanodes and Namenodes have a heartbeat mechanism. The condition is that if the number of datanodes in the cluster is less than datanodeThreshold, the cluster will be in safe mode
Ironically, the default value of this datanodeThreshold is 0 and needs to be set by yourself, so this condition will not take effect unless it is set by yourself. hhh
The third
! NameNodeHasResourcesAvailable () directly translated, disk space is enough, for, reference is the hasAvailableDiskSpace mentioned above, literal translation is to have the available space.
1.9 Starting the Service
Touched, the whole so long can finally start normally
Of course, there’s going to be some services that start up, and we’ll expand that later
Step 3 of the flowchart
It’s just adding the FSNameNode, and then putting all the previous ones together
finally
This is the general process of NameNode startup. Of course, there are many details that we haven’t studied deeply or just touched on. Some of these details are not important and some will be supplemented later.