Written in front: This paper introduces the storage model, architecture design and read and write process of HDFS in detail. As the core of divide and conquer and parallel computing of Hadoop computing layer, it lays a foundation for the subsequent introduction of MapRedcue.
The storage model
- Files are linearly cut into byte chunks (
block
),block
withoffset
andid
- File and file
block
They can be different sizes - One file except for the last one
block
, otherblock
Same size block
The size of a piece of hardwareI/O
Characteristics of adjustmentblock
It is distributed to nodes in a re-clusterlocation
block
Have a copy (replication
), there is no master-slave concept, and copies cannot appear on the same node- Replicas are key to meeting reliability and performance
- File upload can be specified
block
Size and number of copies, after uploading can only modify the number of copies - Multiple reads at one write are not supported (because one of them is modified)
block
And the subsequentblock
Both have to be adjusted and redistributed to replicas, which degrades performance.) - Support for appending data
Architecture design
HDFS
Is a master and slave (Master/Slaves
) architecture- By a
NameNode
(Lord) and someDataNode
(from) constitute, process - File oriented, including file data (
data
) and file metadata (metadata
) NameNode
Stores and manages file metadata and maintains a hierarchical file directory tree (virtual directory structure)DataNode
Storage of file data (block
Block), and provideblock
Read and writeDataNode
withNameNode
Maintain the heartbeat and report what you holdblock
informationClient
andNameNode
Interactive file metadata andDataNode
Interactive fileblock
data
In the figure above, the file names, number of copies, and Blockid of two files are saved in the metadata maintained by NameNode. The number of copies of file part-0 is 2, and the block ids are 1 and 3. The number of copies of file Part-1 is 3, and the block ids are 2, 4, and 5. In datanodes, you can see that blocks 1 and 3 appear in two Datanodes, and blocks 2, 4, and 5 appear in three Datanodes
Role functions
NameNode
- Completely memory based storage of file metadata, directory structures, files
block
Mapping (memory based to provide services externally quickly) - Persistence scheme is required to ensure data reliability (memory is vulnerable to power failure and has limited size)
- Provide copy prevention policies
DataNode
- Storage based on local disks
block
HDFS does not save data for us, only manages mapping. - And save the
block
The checksum is guaranteedblock
The reliability of the - with
NameNode
Keep your heart beating. Reportblock
A list of state
Metadata persistence
- Any operation that causes changes to file system metadata,
NameNode
They all use something calledEditLog
The transaction log is logged - use
FsImage
Stores all metadata states in memory - Save using the local disk
EditLog
andFsImage
EditLog
With integrity, less data loss, but slow recovery speed, and the risk of volume expansion (record real-time add, delete and change operations, small volume and less record the biggest advantage)FsImage
With fast recovery speed, the volume is similar to that of memory data, but it cannot be saved in real time, and data loss is high (Full memory data is overwritten to disk based on a certain point in time at an interval, which has the greatest advantage in faster rolling update point in time)NameNode
Using theFsImage + EditLog
Integration scheme:- The scrolling will be incremental
EditLog
Update to theFsImage
To ensure a more recent point of timeFsImage
And smallerEditLog
Volume (also similar to Redis persistence) - The specific scheme can be: hypothesis
NameNode
Write only once on the first boot at 8amFsImage
By 9 o ‘clockEditLog
Log from 8 to 9, then update from 8 to 9 to 9FsImage
,FsImage
The data point becomes 9 o ‘clock. But this time for the normal runningNameNode
If you want to achieve the benefits of persistence, you can find another machine to do it. That isSecondaryNameNode
I’ll say later.
- The scrolling will be incremental
HDFS is startedEditLog
andFsImage
Loading process
- HDFS is formatted when it is set up. The formatting operation will generate an empty file
FsImage
- when
NameNode
Read from hard disk at startupEditLog
andFsImage
- Will all
EditLog
Transactions in theFsImage
- And put this new version of
FsImage
Save from memory to local disk - Then delete the old one
EditLog
Because of this old oneEditLog
Transactions have been implemented inFsImage
On the
Safe mode
NameNode
After startup, it enters a special state called safe mode, in safe modeNameNode
No data block operations are performedNameNode
From all of theDataNode
Receives heartbeat signals and reports on the status of data blocks- Every time
NameNode
A data block is considered safekt reokucated when a check confirms that the number of copies of the data block has reached the minimum. - In a certain percentage (configurable) of data blocks by
NameNode
After the inspection is confirmed to be safe, and after waiting an additional 30 seconds,NameNode
– Safe mode will be introduced - It then determines which blocks do not have the specified number of copies and copies them to others
DataNode
SecondaryNameNode (SNN)
- In the
HA
Mode,SNN
Normally independent nodes, cycle completion pairsNameNode
theEditLog
toFsImage
Merge, reduceEditLog
Size, decreaseNameNode
The startup time - According to the configuration file
fs.checkpoint.period
Set the interval. The default interval is 3600 seconds - According to the configuration file
fs.checkpoint.size
Set up theEditLog
Size, specifiedEditLog
The default maximum file size is 64 MB
Block copy prevention policy
- First copy: placed in the uploaded file
DataNode
In the case of out-of-cluster submission, randomly select a node whose disks are not too full and CPUS are not too busy - Second copy: Placed on a different rack node than the first copy
- Third replica: nodes with the same rack as the second replica
- More copies: Random nodes
HDFS writing process
- Client search and
NameNode
Establish a connection and create metadata.NameNode
Determine whether metadata is mailbox or not, triggering the replica placement policy to tell the client an orderedDataNode
The list of - Client with the nearest
DataNode
Establish a TCP connection, firstDataNode
andDataNode
Establish a TCP connection, and so on. (Pipeline of Datanodes) - The client will
block
Cut into 64 kpacket
(padded with 512B of chrunk+ 4B of chunksum, i.e. 516B total) to the firstDataNode
, the firstDataNode
One copy in memory and one copy on disk, number oneDataNode
And then pass the data to the second oneDataNode
While the client will be the secondblock
Pass to the first oneDataNode
And so on. Streaming is also a parallel of variants. - When a
block
After the transmission is complete, the client continues to send theNameNode
Request transmission number twoblock
. - HDFS uses this transmission mode, and the number of copies is transparent to clients. when
block
Transmission completed,DataNode
To the respectiveNameNode
The report is made while the client continues to transmit the nextblock
. So client transfer andblock
The report is also parallel
HDFS reading process
- Client search and
NameNode
Set up a connection and obtainblock
information - From close to the client
DataNode
Establish connections (machine, this frame, other frame copies),DataNode
readblock
And return it - The client tried to download
block
And verify data integrity
The HDFS supports the client to define the file offset to connect to datanodes of blocks and obtain data. This is the core of divide-and-conquer, parallel computing that supports the computing layer.