Blocksize depends on dfs.blocksize, 2.x defaults to 128M and 1.x to 64M (128M refers to the maximum size of the block! Each block can store a maximum of 128 MB of data. If the data in the current block storage is less than 128 MB, the disk space will be occupied as much as the data stored. A block belongs to only one file!)
Default is 128M for reasons based on optimal transmission loss theory!
To read or write a file on a disk, you need to address it first!
Optimal transmission loss theory: in a transmission, the addressing time occupies 1% of the total transmission time, the transmission loss is the least, the best cost-effective transmission!
Current hardware development conditions, ordinary disk write rate is about 100M/S, addressing time is generally 10ms!
10ms / 1% = 1s 1s * 100M/S=100M
The block is checked every 64K, so the block size must be 2 to the power of n, and the nearest 100M is 128M!
If the company is using a solid-state drive and the write speed is 300M/S, adjust the block size to 256M
If your company is using a SOLID-state drive and writing speeds are 500 MB /S, adjust the block size to 512 MB
Block sizes need to be adjusted properly
It can’t be too big: we have file A and we have 1 gb of memory. So let’s say we only need to read the 0-128 MB portion of file A. 128 MB of memory can be stored in 8 blocks
(1) In some block reading scenarios, it is not flexible enough, which will bring additional network consumption. (2) When uploading files, once a fault occurs, it will cause a waste of resources
For example, file A is 128M and one block is 128M. The mapping information of a block is 128M. A file of the same size occupies too much metadata space of an NN
end