This is the 11th day of my participation in Gwen Challenge

Before analyzing HDFS consistency, we need to address a few issues with HDFS client behavior.

1. Why does HDFS not support multiple writers to write a file at the same time?

Let’s start with the history of HDFS. HDFS is based on Google’s GFS paper. At the beginning, its main design goal was to store the large data sets operated by MapReduce. We know that in Hadoop, the write operation of each Mapreduce job generally occurs in the Reduce phase (map phase if it is a map-only job). In general, the results of each reducer are written to a HDFS file respectively. A question may arise here: why weren’t all the reducer results written to the same HDFS file? Obviously, multiple reducers perform write operations on the same file, that is, multiple writers simultaneously perform write operations on the same file of HDFS, which requires an expensive synchronization mechanism. The most important thing is that this approach sequentially simplifies the write operations of each Reducer, which is not conducive to the parallelism of each Reduce task. Therefore, HDFS does not need to support multiple writers. A single writer can meet Hadoop requirements.

2. Why does HDFS support file append operations in the later period?

We know that HDFS did not support file appending until version 0.19.0. According to the HDFS design document, HDFS applications need to write files once and read files multiple times. Once a file is created, it is written, closed, and no further changes are required. This assumption simplifies data consistency issues and enables high data throughput. MapReduce programs or web crawlers are good candidates for such a model. Of course future plans support incremental writing. Issues.apache.org/jira/browse… Lucene.472066.n3.nabble.com/HDFS-append…

3. Why can only a single writer be added?

Some in the community want HDFS to implement atomic appending, because GFS implements atomic appending. But Owen O ‘Malley argues that atomic append is a bad thing for both filesystem design and filesystem user interface. Also, they added atomic appending to GFS before MapReduce. Writing MapReduce can serve most applications better than using atomic append. My personal inclination is that atomic append does very bad things to both the design of the file. — Owen O ‘Malley system and the user interface to the file system. Clearly they added atomic append to GFS before they had MapReduce. It seems like most applications would be better served by implementing in MapReduce rather than using atomic append anyways…”

Below is an interview with a Google engineer about GFS2.0 design

Queue.acm.org/detail.cfm?… QUINLAN At the time, [RecordAppend] must have seemed like a good idea, but in retrospect I think the consensus is that it proved to be more painful than it was worth. It just doesn’t meet the expectations people have of a file system, so they end up getting surprised. Then they had to figure out work-arounds. At the time, record appending seemed like a good idea, but looking back, we can all agree that it proved to cause more pain than good. It doesn’t meet the expectations of file system users, so

MCKUSICK In retrospect, how would you handle this differently?

QUINLAN I think it makes more sense to have a single writer per file.

MCKUSICK All right, but what happens when you have multiple people wanting to append to a log? Ok, what if multiple users need to append a log?

QUINLAN You serialize the writes through a single process that can ensure the replicas are consistent. You serialize write operations to a single process, which ensures that copies are consistent.

4. What is the consistency of an application like HDFS?

As a file system, the HDFS must ensure the sequence of file contents.

What are the consistency challenges of HDFS with appending operations?

Each copy of the last block of the file may have a different number of bytes at a given time. What read consistency does HDFS provide, and how to ensure consistency even in the event of a failure.

HDFS consistency basis

When a client reads a copy on a DataNode, the DataNode does not make all bytes it receives visible to the client. Each RBW copy maintains two counters:

  1. BA: indicates the number of bytes received by the downstream DataNode. Those bytes that the DataNode makes visible to any reader. Below, we can refer to it by the visible length of the copy.
  2. BR: The number of bytes received for this block, including bytes written to the block file and bytes cached in the DataNode.

Assume that all datanodes in the pipeline initially have (BA, BR) = (a, a). The client pushes a b-byte packet into the pipeline and does not push another packet into the pipeline until the client receives no reply from this packet:

  1. After completing 1.a, DataNode changes its (BA, BR) to (a, A +b).
  2. After completing 3.a, the DataNode changes (BA, BR) to (a+b, a+b).
  3. When a reply representing success is sent back to the client, all datanodes in the pipeline have (BA, BR) = (a+b, A +b).

❤ ️ ❤ ️ ❤ ️ ❤ ️

Thank you very much talent can see here, if this article is written well, feel that there is something to ask for praise 👍 for attention ❤️ for share 👥 for handsome Oba I really very useful!!

If there are any mistakes in this blog, please comment, thank you very much!

At the end of this article, I recently compiled an interview material “Java Interview Process Manual”, covering Java core technology, JVM, Java concurrency, SSM, microservices, databases, data structures and so on. How to obtain: GitHub github.com/Tingyu-Note… , more attention to the public number: Ting rain notes, one after another.