1. HDFS data writing process
1) The client uploads a file to NameNode through the Distributed FileSystem. NameNode checks whether the target file exists and whether the parent directory exists.
2) NameNode returns whether it can be uploaded.
3) The client requests which DataNode servers to upload the first Block to.
4) NameNode returns three Datanodes: DN1, dn2, dn3.
5) The client requests DN1 to upload data through FSDataOutputStream module, dN1 will continue to call DN2 after receiving the request, and then DN2 will call DN3 to complete the establishment of the communication pipeline.
6) DN1, DN2, DN3 step by step answer the client.
7) The client starts to upload the first Block to DN1 (read data from disk and put it into a local memory cache). In terms of Packet, DN1 receives a Packet and sends it to DN2 and DN2 to DN3. Once DN1 sends a packet, it puts the packet into a reply queue and waits for the reply.
8) When a Block transfer is complete, the client requests NameNode to upload the second Block to the server. Repeat steps 3-7.
2. HDFS data reading process
1) The client requests NameNode to download the file through the Distributed FileSystem. NameNode queries metadata to find the DataNode address where the file block resides.
2) Select a DataNode (nearby and random) server to request data reading.
3) The DataNode starts to transmit data to the client (reads data input streams from disks and performs Packet verification).
4) The client receives the Packet in unit, which is first cached locally and then written to the target file.
Network topology-node distance calculation
In the HDFS data writing process, the NameNode selects the DataNode that is closest to the data to be uploaded to receive data. So how do we calculate the closest distance?
Node distance: the sum of distances between two nodes to their nearest common ancestor.
When the replication factor is 3, the HDFS storage policy is as follows: If the writer is on the data node, a copy is placed on the local computer. Otherwise, on the random data node, place the HDFS on another copy on a node in a different (remote) rack. The last one is on another node in the same remote rack. This policy reduces inter-rack write traffic and generally improves write performance. The chance of rack failure is much smaller than the chance of node failure. This policy does not affect data reliability and availability assurance. However, because a block is placed in only two unique racks, instead of three, it does reduce the total network bandwidth used to read the data. With this policy, copies of files are not evenly distributed across the rack. One-third of the replicas are on one node, two-thirds of the replicas are on one rack, and the remaining one-third are evenly distributed across the remaining racks. This policy improves write performance without affecting data reliability or read performance. If the replication factor is greater than 3, the bits of the fourth and following copies are randomly determined while determining that the number of copies per rack is below the upper limit (basically (copy-1)/rack + 2).