Introduce a,
The Hadoop Distributed File System (HDFS) is a Distributed File System under Hadoop. It features high fault tolerance and throughput and can be deployed on low-cost hardware.
Ii. Design principle of HDFS
2.1 the HDFS architecture
HDFS follows the master/slave architecture and consists of a single NameNode(NN) and multiple Datanodes (DN) :
- NameNode: Responsible for execution related
File system namespace
Operations, such as open, close, rename files and directories, etc. It is also responsible for the storage of cluster metadata, recording the location of each data block in the file. - DataNode: Provides read and write requests from file system clients and creates and deletes blocks.
2.2 File system Namespace
The HDFS file system namespace hierarchy is similar to that of most file systems (such as Linux). Directories and files can be created, moved, deleted, and renamed. Users and access permissions can be configured, but hard links and soft connections are not supported. NameNode is responsible for maintaining the file system namespace, logging any changes to the namespace or its attributes.
2.3 Data Replication
Since Hadoop was designed to run on cheap machines, which meant that the hardware was unreliable, HDFS provided a data replication mechanism to ensure fault tolerance. HDFS stores each file as a series of blocks. Each block has multiple copies to ensure fault tolerance. The block size and replication factor can be customized (by default, the block size is 128 MB, and the replication factor is 3 by default).
2.4 Principles of Data Replication
A large HDFS instance is deployed on multiple servers on multiple racks. Two servers on different racks communicate with each other through switches. In most cases, the network bandwidth between servers in the same rack is greater than the bandwidth between servers in different racks. Therefore, HDFS adopts the rack-aware copy placement policy. In common cases, when the copy factor is 3, HDFS uses the following policy:
When the writing program is located on a Datanode, a copy of the written file is placed on that datanode first, otherwise it is placed on a random datanode. Then place another copy on any node on another remote rack, and the last copy on another node on the rack. This policy can reduce inter-rack write traffic and improve write performance.
If the replication factor is greater than 3, the location of the fourth and subsequent copies is randomly determined, and the number of copies on each rack is lower than the upper limit, which is usually (replication factor -1)/number of racks + 2. It is necessary to ensure that the same dataNode does not have multiple copies of the same block.
2.5 Copy Selection
To minimize bandwidth consumption and read latency, the HDFS preferentially reads the copy closest to the reader. If a copy exists on the same rack as the reader node, the copy is preferentially selected. If the HDFS cluster spans multiple DCS, the replica in the local DC is preferred.
2.6 Architecture Stability
1. Heartbeat mechanism and replication
Each DataNode periodically sends heartbeat messages to NameNode. If no heartbeat message is received within the specified time, the DataNode is marked as dead. NameNode does not forward any new IO requests to datanodes marked dead, nor does it use the data on these Datanodes. Because the data is no longer available, some blocks may have replicators less than their specified value, and NameNode keeps track of these blocks and re-copies them if necessary.
2. Data integrity
Data blocks stored on datanodes may also be damaged due to storage device faults. To avoid errors caused by reading damaged data, the HDFS provides a data integrity check mechanism to ensure data integrity. The details are as follows:
When a client creates an HDFS file, it calculates the checksum for each block of the file and stores the checksum in a separate hidden file in the same HDFS namespace. When the client retrieves the file contents, it verifies that the data received from each DataNode matches the checksum stored in the associated checksum file. If the match fails, the data is corrupted, and the client chooses to get another available copy of the block from another DataNode.
3. The metadata disk is faulty
FsImage and EditLog are the core data of HDFS. If these data are lost unexpectedly, the entire HDFS service may become unavailable. To avoid this problem, you can configure the NameNode to support synchronization of multiple copies of FsImage and EditLog, so that any changes to FsImage or EditLog cause synchronous updates of FsImage and EditLog for each copy.
4. Snapshots are supported
Snapshots allow you to store a copy of data at a specific point in time. If data is damaged unexpectedly, you can roll back the data to a healthy state.
3. Features of HDFS
3.1 high fault tolerance
Because the HDFS uses multiple data copies, the damage of some hardware does not result in data loss.
3.2 High Throughput
HDFS is designed to support high-throughput data access rather than low-latency data access.
3.3 Large file support
HDFS is suitable for storing large files. The file size should be from GB to TB.
3.3 Simple consistency model
HDFS is more suited to the write-once-read-many access model. Content can be appended to the end of a file, but data cannot be randomly accessed. New data cannot be added from any location in the file.
3.4 Cross-platform portability
HDFS has good cross-platform portability, which makes other big data computing frameworks regard it as the preferred solution for persistent data storage.
Attached: diagram HDFS storage principle
Note: The following picture is from the blog: translation classic HDFS principle explain comics
1. HDFS data writing principle
2. HDFS data reading principle
3. HDFS fault types and detection methods
Part two: Handling read/write faults
Part three: Troubleshooting DataNode faults
Copy layout strategy:
The resources
- Apache Hadoop 2.9.2 > HDFS Architecture
- Tom White. The Authoritative Guide to Hadoop [M]. Tsinghua University Press. 2017.
- Translate the classic HDFS comics
See the GitHub Open Source Project: Getting Started with Big Data for more articles in the big Data series