The introduction

HDFS is a highly fault tolerant system suitable for deployment on inexpensive machines. HDFS provides high-throughput data access and is suitable for applications on large-scale data sets. HDFS relaxes some POSIX constraints to enable streaming reading of file system data. HDFS was originally developed as the infrastructure for the Apache Nutch search engine project. HDFS is part of the Apache Hadoop Core project. The project’s address is hadoop.apache.org/core/.

Premise and design goals

Hardware error

Hardware errors are the norm, not the exception. HDFS may consist of hundreds or thousands of servers, each of which stores some data of the file system. The reality is that the number of components that make up the system is huge, and any one of them can fail, which means that there will always be a part of HDFS that doesn’t work. Therefore, error detection and fast and automatic recovery are the core architectural goals of HDFS.

Streaming data access

Applications running on HDFS are different from ordinary applications in that they require streaming access to their data sets. HDFS is designed with batch processing in mind rather than user interaction. More critical than the low latency of data access is the high throughput of data access. Many of the hard constraints of the POSIX standard Settings are not required for HDFS applications. In order to improve the throughput of the data, some changes have been made to POSIX semantics in some key areas.

Large data set

Applications running on HDFS have large data sets. A typical file size on HDFS is between GIGABytes and terabytes. As a result, HDFS is tuned to support large file storage. It should provide an overall high data transmission bandwidth that can scale to hundreds of nodes in a cluster. A single INSTANCE of HDFS should support tens of millions of files.

Simple consistency model

HDFS applications require a single-write, multiple-read file access model. After a file has been created, written, and closed, it does not need to change. This assumption simplifies data consistency issues and enables high-throughput data access. Map/Reduce applications or web crawler applications fit this model well. There are plans to extend the model in the future to support additional writes to files.

‘Mobile computing is more cost-effective than mobile data’

The closer an application requests a calculation to the data it operates on, the more efficient it is, especially when the data is at a massive level. This reduces the impact of network congestion and improves the throughput of system data. It’s much better to move computing closer to the data than to move the data to the application. HDFS provides an interface for applications to move themselves near data.

Portability between heterogeneous hardware and software platforms

HDFS was designed with platform portability in mind. This feature facilitates the promotion of HDFS as a large-scale data application platform.

Not applicable Scenario

Not suitable for low latency data access

1) It is suitable for high throughput scenarios, where a large amount of data is written at a time. But it doesn’t work with low latency, such as reading data in milliseconds.

Unable to efficiently store large numbers of small files

1) If a large number of small files are stored, it will occupy a large amount of NameNode memory to store files, directories, and block information. This is not desirable because NameNode memory is always limited.

2) The seek time of small file storage will exceed the read time, which violates the design goal of HDFS.

Concurrent write, random file modification

1) A file can have only one write. Multiple threads are not allowed to write simultaneously.

2) Only support data append (add), do not support random modification of files.

Some key elements of HDFS

Block

Designed to support large files, HDFS is suitable for applications that need to handle large data sets. These applications write data once, but read it once or more, and the reading speed should be sufficient for streaming reading. HDFS supports the once write multiple read semantics of files. A typical data block size is 64MB. Therefore, files in the HDFS are always divided into 64 MB blocks, and each block is stored on different Datanodes as much as possible.

NameNode

HDFS supports traditional hierarchical file organization structure. Users or applications can create directories and store files in these directories. The hierarchy of file system namespaces is similar to most existing file systems: users can create, delete, move, or rename files. Currently, HDFS does not support user disk quotas, access permission control, hard and soft links. However, the HDFS architecture does not prevent implementation of these features.

Namenode is responsible for maintaining the namespace of the file system, and any changes to the namespace or properties of the file system are logged by Namenode. Applications can set the number of file copies saved by the HDFS. The number of copies of the file is called the copy coefficient of the file, and this information is also kept by the Namenode.

DataNode

The Namenode manages the replication of data blocks and periodically receives heartbeat signals and block status reports from each Datanode in the cluster. Receiving heartbeat signals indicates that the Datanode is working properly. The block status report contains a list of all data blocks on the Datanode.

NameNode DataNode
Storing metadata Storing file contents
Metadata is stored in memory The file content is saved on disk
Save the mapping between file block-datanode Maintains the mapping between block ids and datanode local files

SecondaryNameNode

Periodically merge NameNode Edit logs (sequence of changes to the file system) to Fsimage (snapshot of the entire file system) and copy the modified fsimage to NameNode.

Provides a checkpoint of the NameNode (do not think of it as a backup of the NameNode) that can be used to recover the NameNode.

HDFS error and recovery

The HDFS has high fault tolerance and is compatible with inexpensive hardware. It regards hardware errors as normal rather than abnormal, and designs a mechanism to detect and automatically recover data errors, including name node errors, data node errors, and data errors.

Name node error

The name node holds all metadata information, and the two most important data structures are FsImage and Editlog. If these two files are corrupted, the entire HDFS instance will fail. Therefore, the HDFS sets a backup mechanism to synchronize these core files to the SecondaryNameNode backup server. When the name node fails, it can be restored according to the FsImage and Editlog data in SecondaryNameNode.

Data node error

Each data node periodically sends a “heartbeat” message to the name node to report its status to the name node.

When the data node fails, or network was broken nets, the name of the node will not be able to receive messages from some data node heartbeat, at this time, the data node will be marked as “down”, node above all data will be marked as “read”, the name of the node will not send them any I/O requests.

At this point, there may be a situation where the number of copies of some data blocks is less than the redundancy factor due to the unavailability of some data nodes.

The name node periodically checks for this, and when it finds that the number of copies of a data block is less than the redundancy factor, it starts a redundant copy of the data and generates a new copy for it.

The biggest difference between HDFS and other distributed file systems is that the location of redundant data can be adjusted.

Data error

Factors such as network transmission and disk errors can cause data errors.

After reading data, the client verifies the data block using MD5 and SHA1 to ensure that the data is correctly read.

When a file is created, the client extracts information from each file block and writes it to a hidden file in the same path.

When the client reads the files, will read the information file first, and then, using the data read the information on file for each piece of check, if check error, the client will request to another data node reads the file, and to report the file name node block there is an error, the name of the node will check on a regular basis and to copy the block.

An error occurred during data reading. Procedure

  1. DataNode hangs during reading
  2. The read file data is corrupted. Procedure

The distributed storage mechanism of HDFS ensures data storage reliability. In the first case, if the DataNode is down, you only need to transfer the data to the DataNode where other copies reside.

In the second case, if the file data block read fails verification, it can be regarded as damaged. It can still be transferred to read other intact copies and report to NameNode that the file block is damaged. NameNode notifies DataNode to delete damaged file blocks in subsequent processing. And make a new copy of the file block from the good copy.

An error occurred during data writing. Procedure

The possible exception patterns are listed below:

  1. The Client hangs during the write process
  2. A DataNode hangs during the Client write process. Procedure
  3. The NameNode hangs while the Client is writing. Procedure

There are recovery modes for each of the exception modes listed above.

While the Client is writing, it hangs. Since a Client needs to apply for a lease from the NameNode before writing a file, only the owner of the lease is allowed to write the file, and the lease must be renewed periodically. Therefore, when the Client hangs up, the HDFS releases the lease of the file and closes the file after the timeout. This prevents the file from being exclusively used by the hung Client and prevents other people from writing files. This process is called Lease Recovery. When lease recovery is initiated, if multiple file block copies are in inconsistent states on multiple DataNodes, they must be restored to the state of the same length. This process is called Block Recovery. This process can only be initiated during Lease Recovery.

When the Client is writing data, a DataNode hangs. Procedure Instead of immediately terminating the write process (which would be too ease-of-use and usability unfriendly), HDFS tries to remove the hung DataNode from the pipeline and resume the write, a process called Pipeline Recovery. This will result in an insufficient number of copies, and then the Namenode will issue a request to copy a Block.

The NameNode hangs while the Client is writing. NameNode has finished allocating datanodes. If NameNode hangs at the beginning, the HDFS is unavailable, so writing cannot start. During pipelinewriting, when a block needs to report its status to NameNode after being written, the NameNode hangs up and the status report fails. However, the flow work of Datanodes is not affected. Data is saved first. However, an error occurs when the Client requests NameNode to close the file in the last step. Due to the single-point feature of NameNode, automatic recovery cannot be performed, and manual recovery is required.

HDFS data reading and writing process

File reading procedure

Specific steps:

  1. The HDFS client uses the open() method of DistributeFileSystem to open the file to be read
  2. The DistributeFileSystem initiates an RPC call to the remote NameNode node to obtain the data block information of the file and return the data block list. (For each data block, NameNode returns the DataNode address of that data block.)
  3. The DistributeFileSystem returns a FSDataInputStream object to the client, which calls the read() method of the FSDataInputStream object to read the data
  4. Data is transferred from the data node to the client by calling the read() method on the data stream return
  5. When the data of a node is read, the FSDataInputStream object closes the connection to the DataNode and then connects to the nearest DataNode of the next data block of the file
  6. When the client finishes reading the data, it calls the close() method of the FSDataInputStream object to close the input stream

File writing process

  1. The client calls the Create () method of the DistributedFileSystem object to create a file output stream object
  2. The DistributedFileSystem object makes an RPC call to the remote NameNode node, which checks whether the file exists and whether the client has permission to create a new file
  3. The client calls the write() method of the FSDataOutputStream object to write data (the data is first written to the buffer and then divided into packets by money).
  4. Each data packet is sent to a data node in a set of data nodes allocated by the NameNode node, and the data packet is transmitted once through the pipeline of this set of data nodes
  5. The nodes on the pipe return acknowledgements in reverse order, and finally the first data node on the pipe sends the acknowledgements for the entire pipe to the client
  6. When the client finishes writing, it calls the close() method to close the file output stream
  7. Notifies NameNode that file writing is successful

HDFS DFS Run commands

Commonly used:

HDFS DFS -mkdir -p /project/test/dir1 /project/test/dir2 \ HDFS DFS -ls /project/test/ Upload the local file to the HDFS target path: \ HDFS DFS -put /home/bb/test1. TXT /project/test/ Change the permission of the file. HDFS DFS - chmod executable file path a few lines of sample data before 755 query: \ HDFS DFS - cat/project/test/test. TXT | head - 17 within five rows of data query: \ HDFS DFS - cat/project/test/test. TXT | tail - 5 random sample return to specify the number of lines: \ HDFS DFS - cat/project/test/test. TXT | shuf - n 5 view text lines: \ HDFS DFS - cat/project/test/test. TXT | wc - l query after the fifth line of the file content: \ HDFS DFS - cat/project/test/test. TXT | tail - n + 5 query filter field num: \ HDFS DFS - cat/project/test/test. TXT | grep num view file size: HDFS DFS -du /project/test/test. TXT \ HDFS DFS -du /project/test/* HDFS DFS -count /project/test/ / HDFS dfs-ls -r /project/test/ or HDFS dfs-lsr /project/test/ copy files from the target path to the local directory: \ HDFS DFS -get /project/test/test1 /home/bb/ Copy files or directories to the target path: \ HDFS DFS -cp Source path Destination path Delete files or directories: \ HDFS DFS -rm path \ Delete folder and contents in folder: \ HDFS DFS -rm -r path \ Skip garbage bin Delete: \ HDFS DFS -rm -r -skipTrash path is displayed as text to standard output: \ HDFS DFS -text /project/test/ Display the last 1KB of the file to the standard output: \ HDFS DFS -tail -f /project/test/t.txtCopy the code