This is the 31st day of my participation in the August More Text Challenge

The basic agreement

  • The update operation write is a sequential process
  • The order of update operations is determined through the remaining mechanisms. For example, the order is determined by the primary in the primary-secondary schema
  • Each update operation is denoted as WI, where I is the serial number of the monotonically increasing update operation
  • After each WI is executed successfully, duplicate data changes. The data version is called vi
  • Assume that each copy holds data for all historical versions

write-all-read-one

  • write-all-read-one: WARO
    • Is one of the simplest copy control rules
    • Write all copies during update. The update is considered successful only when all copies are updated successfully, ensuring the consistency of all copies
    • You can read data on any copy when reading data
  • For update services, thoughNNumber of copies, but the system cannot tolerate any exception:
    • The update operation can succeed only when all N replicas are successful. Therefore, once one replica is abnormal, the update operation fails and the update service is unavailable
  • For the read service,NThe system can provide read service as long as one of the copies is normal:
    • When there are N copies, the system can tolerate N-1 copy exceptions
  • Thus,WARO has high availability of read services, but low availability of update services. Despite the use of replicas, the availability of the update service is equivalent to having no replicas

Quorum is defined

  • Under Quorum, an update operation wi is considered a “successfully committed update operation” if it succeeds on W of all N replicas. The corresponding data is “Successfully submitted data”
  • Let R > n-w, because the update operation WI succeeds only on W copies, because W+R > N, the set composed of any R copies must intersect with the set composed of W copies that succeed, so reading R copies must read the updated data VI of WI
  • Availability analysis of Quorum mechanism:
    • Restrict Quorum parameter to W+R=N+1
    • For the update service, the update operation can succeed only when the update operation succeeds on all W copies. Therefore, if n-W +1 copies are abnormal, the update operation cannot succeed on all W copies and the update service is unavailable
    • For the read service, if n-R +1 replicas are abnormal, it cannot be guaranteed that the replica set that has intersection with W replicas can be read. Therefore, the consistency of the read service decreases
  • Rely solely onQuorumMechanisms do not guarantee strong consistency:
    • With Quorum alone, the latest version number that has been successfully committed cannot be determined
    • Unless the latest committed version number is managed as metadata by a specific metadata server or metadata cluster, it is difficult to determine the latest successfully committed version number
  • QuorumThree system parameters for the mechanismN,W,RIt controls the availability of the system and is also the service commitment of the system to users:
    • A maximum of N data copies exist. If W data copies are updated successfully, the update succeeds
    • For a high consistency Quorum system, the system also promises not to read uncommitted data at any time, i.e., to read data that has previously been successful on W replicas

Read the latest successfully committed data

  • The Quorum mechanism only needs to successfully update W out of N replicas, and in R replicas it must read the latest successfully committed data
  • Since there are unsuccessful updates, reading only R copies does not necessarily determine which version of the data is the most recently committed
  • For a strong consistencyQuorumSystem:
    • If less than W copies of data are read successfully, for example, X copies, the remaining copies are read until W copies of this version are read successfully. Then, the data is the latest data submitted successfully
    • If the number of copies of this data must not exceed W, then the copy with the second largest version number in R is the latest successfully committed copy
    • Example:
      • In read(v2, v1, v1), continue to read the remaining copies:
        • If the remaining two copies are read as (v2, v2), then v2 is the latest successfully committed copy
        • If the remaining two copies are read as (v2, v1) or (v1, v1), then v1 is the latest successfully committed version
        • If reading of either of the next two copies times out or fails, there is no way to determine which version is the most recent successfully committed copy
  • When using Quorum only, a maximum of R+(W-R-1)=N copies must be read to determine the latest successfully committed version, and the ability to read the latest successfully committed version may not be available in the event of any copy exception
  • In engineering practice, reading the latest successfully committed version through the Quorum mechanism should be avoided as much as possible. Instead, it can be used in combination with the primary-secondary control protocol to read the latest committed copy through reading the primary

Select primary copies based on Quorum mechanism

  • Read data in different ways according to consistency requirements:
    • If consistency is required to immediately read the latest successfully committed data:
      • You can read only the data on the primary copy directly
      • You can also use the Quorum mechanism alone to read the latest successfully committed version data
    • If session consistency is required:
      • You can selectively read on each copy based on the version number of the data that has been read previously
    • If only weak consistency is required:
      • You can select any copy to read
  • In the primary-secondary protocol:
    • When the primary is abnormal, a new primary needs to be selected, and the secondary copy synchronizes data with the primary
    • Typically, the job of selecting a new primary is done by a central node
  • Primary-secondary protocol based on Quorum:
    • The common primary selection method is similar to the data reading method: the central node reads R copies and selects the copy with the highest version number among R copies as the new PRIMARY
    • newprimaryWith at leastWAfter data synchronization is completed, the new copy will be usedprimaryProvide read and write services:
      • The copy with the highest version number of R copies must contain the latest successfully committed data
      • Although it is not certain whether the highest version number is a successfully committed data, the new primary then synchronizes the data with the secondary, bringing the number of copies of that version up to W, making that version a successfully committed data
  • Example:
    • In the system with N=5, W=3 and R=3, the maximum version number of the copy at a certain time is (v2,v2, v1, v1, v1). At this time, V1 is the latest successfully submitted data of the system, and v2 is an intermediate data that has not been successfully submitted
    • Assuming that the primary copy is abnormal at the moment, the central node performs a primary switch
    • This kind of“Intermediate state”Data is used asDirty DataWhether the data is deleted or synchronized as new data becomes effective depends on whether the data can be added to the new dataprimaryThe election:
      • If the central node is one of them3Two copies are successfully communicated, and the version number read is(v1, v1, v1),A copy is selectedprimary,The newprimaryIn order tov1As the latest successfully committed version and synchronizes with the remaining copies
        • When synchronizing data with the first and second replicas, the version number of the first and second replicas is greater than that of the primary. Therefore, the first and second replicas are dirty data. Therefore, you can delete dirty data based on undo logs
        • In engineering practice, the new primary may also provide data services after synchronization with the later two replicas, and then update its version number to v2. If the system cannot guarantee that the later v2 is exactly the same as the previous v2, the new primary will compare the version number and the details of the update operation when synchronizing data with the first and second copies
      • If the central node communicates with three copies successfully and the version number is (v2, v1, v1), the copy with the version number v2 is selected as the new primary
      • If the new primary completes data synchronization with the other two replicas, the number of copies matching V2 reaches W, and the new primary can provide normal read and write services