1. Upload the internal execution diagram of the file

1.1. Let’s talk about uploading HDFS files

1: First there is a file, called 1.txt, let’s say this file is 300MB.
2: If the client wants to upload the 1.txt file, it sends a request to NameNode, the boss of HDFS, to upload the file.
3: NameNode checks whether the request permission is valid, whether the parent directory exists, and whether there is a file with the same name. If the check fails, the request fails. Check by looking at the following steps.
4: NameNode, the boss of HDFS, sends a call to the Client and requests permission to upload files.
5: The Client starts to upload the first block after splitting the file
6: NameNode obtains three suitable Datanodes according to the replica placement policy
7: NameNode returns the host list of DataNode to the Client
8: Pipeline is established between the Client and DataNode1, DataNode2, and DataNode3.
9: The Client sends data in the unit of packet. Each PACKE is 64K.
10: DataNode (1,2,3) does not receive a packet, cache it, and then continue to pass the packet.
11: When the DataNode receives a packet, it sends an ACK response and stores it in the reply queue
12: After sending a Block, the DataNode stores data to the hard drive
13: The same is true for other blocks

1: The client sends a read request to NameNode.
2: NameNode checks the permission of the request to check whether the request has operation permission and whether the file exists.
3: NameNode Returns the Block list of the downloaded file.
4: Returns the block list to the client and sorts the hosts where each copy resides.
5: Select the optimal host for reading each block according to the actual situation.
6: The Client establishes pipeline pipelines with each DataNode host that stores blocks.
7: Read data from multiple Datanodes simultaneously (parallel reading)
8: Combine multiple blocks into a complete file.