Today I finally read the paper of the famous GFS. The general design idea is clear. Complex system actually uses a lot of simple logic in it.
Early design (system requirements)
\
The system uses cheap PC nodes to implement, so the monitoring and fault tolerance requirements are high.
The main goal of the system is to provide file services larger than 100Mb. Common files are larger than Gb. Small files are supported but not optimized.
There are two main types of reads in the system: continuous stream reads and random hops. It also includes writing and appending large files and continuous files.
The system must support multiple customers. Such as using a producer-consumer queue model or other methods.
Network response times need to be fast.
\
GFS architecture
Master/slave mode for data
The mission of the Master
Chunk Size selection (64Mb)
Metadata is the location of memory storage and Chunk
Operation Log: How to process concurrent logs and how to ensure Log security (a successful Operation is returned only when local and remote logs are written simultaneously)
\
Garbage collection mechanism
Component failures are common in large distributed systems. So there needs to be a mechanism to check or retract those inconsistent Spaces.
\
High availability policy
1. Quick recovery. Can the service be restored in seconds, no matter what the cause
2. Block replication.
3. Copy Master and Shadown to ensure readable files in case of failure
\
Data consistency. Consistent information is stored in memory and verified when a file is read