1. Write it first
Recently I was reading a book called Designing Data-Intensive Application about the difficulties of Designing a data-intensive Application. In order to prevent their own hastily read again, can not internalize the knowledge of the book into their own knowledge, so or the old rules to write a reading notes, not only deepen the understanding but also convenient follow-up review, kill many birds with one stone.
As an unconventional person, reading is also based on mood. So the conclusion of the “unexpected” first article is the second part of the book, Chapter 5 in Distributed Data, replication.
2. Copy
2.1 Purpose of Replication
Before you begin to understand replication, consider the question – what is the purpose of data replication?
- Improved availability (the system continues to work even if part of it fails
- Reduce latency (bringing data closer to the user geographically
- Provides read throughput (scaling the number of machines that can accept read requests
2.2 Replication Mode
If the replicated data does not change over time, replication is simple, just copying the data to each node once. The difficulty of replication is dealing with changes to replicated data. The following covers replication algorithms in almost all distributed databases:
- Single leader
- Multi leader
- I am leaderless.
3 Single leader
3.1 architecture diagram
3.2 Problems to be solved
-
Synchronous or asynchronous replication?
The best way to make choices is to list the pros and cons, and then compare them one by one.
advantages disadvantages Synchronous replication Ensure that the slave and master libraries have up-to-date copies of data that are consistent When the secondary library does not respond, the primary library cannot continue writing Asynchronous replication The master library can continue writing even when the slave library falls behind The primary library is invalid and not recoverable, and writes that are not copied to the secondary library are lost Semi-synchronous replication Taking into account the advantages of both synchronous and asynchronous Increased cost of replication Note: In the semi-synchronous replication architecture, a synchronous slave and an asynchronous slave are set up for the master
-
How do I ensure that a newly added slave library has an exact copy of the master library data?
- Innobackupex takes a snapshot of the primary library at a certain time without locking the primary library
-
Copy the snapshot to the new slave node
- The slave library connects to the master library and pulls all data changes that have occurred since the snapshot. This requires that the snapshot be precisely associated with a location in the master library replication log.
- When the slave library processes a backlog of data changes after the snapshot, it is said to have caught up with the master library. That is, you can continue to process data changes that occur on the main library.
-
How to handle node downtime?
-
Failure from library – catch-up recovery
-
Primary library failure – failover
Note 1: Failover — The process of promoting a slave library to the new master library, reconfiguring clients to send their writes to the new master library, and fetching changes from the library to the new master is called failover.
Note 2: Failover of the master database will have a series of problems in details, such as promoting the slave database of asynchronous replication will lose some data, both nodes think they are the master database (split brain), how to set the timeout period to mark the master database as unavailable… And so on.
-
3.3 Implementation of Replication
How is data replication implemented between master and slave libraries? — “The devil is in the details”
-
Statement-based replication – The master library records every write request it executes and sends a statement-based log to the slave library. Eg. For relational databases, this means that each INSERT, UPDATE, or DELETE statement is forwarded to each slave library.
Note: there are pits in this way, in the following three cases.
- Functions that call nondeterministic functions may generate different values on each copy.
- With autoincrement columns, or relying on existing data in the database, they must be executed in exactly the same order on each copy. This can be a limitation when there are multiple concurrently executing transactions.
- Statements that have side effects (for example, triggers, stored procedures, user-defined functions) may have different side effects on each copy.
-
Transport Write-ahead Logs (WAL)
WAL contains which bytes in which disk blocks have changed. This makes replication tightly coupled to the storage engine.
-
Logical log replication (row-based)
Copy and store engine different log formats, so that the log can be separated from the storage engine. For example, MySQL binary logs
Note: This approach has the advantage of enabling leaders and followers to run different versions of database software and even different storage engines.
-
Trigger based replication
Triggers allow the automatic execution of custom application code when data changes (write transactions) occur in the database system.
3.4 Solving the problem of replication delay
In an asynchronous replication scenario, data is guaranteed final consistency, meaning that if writing to the database is stopped for a period of time, the slave will eventually catch up and be consistent with the master. However, the term “eventually” is ambiguous, which means that there is no limit to how far a copy can fall behind, namely the problem of replication delay, which causes the following common problems.
- Does the user fail to read data from the old copy after writing data?
- Solution: Use read-after-write consistency to prevent this exception. Read after write can be defined as reading directly from the master, adding a logical clock to determine whether the currently read copy has the latest data, etc.
- A user reads data from a newer copy and then reads data from a newer copy. What is the problem?
- Solution: Make sure that every user always reads from the same copy.
- Missing order of user event writes? For example, operation A and operation B are executed before operation B, but the delay of operation B is low, while the delay of operation A is high. Therefore, the order of operation A and operation B is likely to change to OPERATION B and operation A when operation B is read from the slave library.
- Solution: If a series of writes occurs in a certain order, then any reads of these writes will also see them in the same order, such as any causal writes to the same partition.
4. Have multiple leaders
Using multiple master libraries within a single data center rarely makes sense because the benefits rarely outweigh the costs of complexity.
4.1 architecture diagram
A multi-leader configuration can have master libraries in each data center, as illustrated in the figure above. Use regular master-slave replication within each data center; Between data centers, the master library in each data center copies its changes to the master library in the other data centers.
Note: In the case of a single-leader multi-data center, the master library must be in one of the data centers, the replicas are scattered in different data centers, and all writes must pass through the data center where the master library resides.
4.2 Application Scenarios of Multi-Master Replication
Operate and maintain multiple data centers
In the case of operation and maintenance of multiple data centers, compare advantages and disadvantages of single leader and multi-leader.
Single leader | Many leaders | |
---|---|---|
performance | Writes must cross the Internet into the data center where the master library resides, adding latency. | All writes are performed in the local data center and are asynchronously assigned to other data centers with low latency. |
Tolerates data center outages | If the data center where the primary database resides is faulty, failover to the secondary database in another data center. | Each data center can work independently of the other data centers, and when a failed data center comes back to the team, replication automatically catches up. |
Tolerating Network problems | Data centers communicate across the public Internet | Local networks within data centers are more reliable |
Note: Some databases support multi-master replication by default, but it is common to use external tools, such as MySQL’s Tungsten Replicator, PostgreSQL’s BDR, and Oracle’s GoldenGate.
Multi-master replication is often considered a dangerous area and should be avoided as much as possible.
🤔 : In multi-leader mode, a problem occurs in a data center. In the mode that a single data center provides services, will the inserted data ids be discontinuous and empty?
A client that needs to operate offline
Another scenario where multi-master replication is applicable is when the application needs to continue working even after the network has been disconnected. For example, consider calendar apps on phones, laptops, and other devices. Regardless of whether the device is currently connected to the Internet, you need to be able to view your meetings and enter new meetings at any time. If any changes are made offline, the device needs to be synchronized with the server and other devices the next time the device goes online.
Collaborative editing
Real-time collaborative editing applications allow multiple people to edit documents simultaneously, such as Google Docs, which allows multiple people to edit text documents or spreadsheets at the same time. When a user edits a document, the changes made are immediately applied to the local copy and asynchronously copied to the server and any other users who edit the same document.
4.3 Handling write Conflicts
Definition of conflict: when two writes concurrently modify the same field in the same record and set it to two different values.
Synchronous and asynchronous conflict detection:
- In a single-leader database, the second write is blocked and either waits for the first write to complete, or the second write transaction is aborted, forcing the user to retry.
- In a multi-leader database, both writes are successful and only conflicts are detected asynchronously at a later point in time. But it’s too late to ask the user to resolve the conflict.
Avoid conflicts: The simplest strategy for dealing with conflicts is to avoid them, and if the application can ensure that all writes to a particular record go through the same leader, conflicts will not occur. Because of the multi-leader replication scenario, avoiding conflict is often recommended. For example, in a user-editable data application, ensure that requests from a particular user always go to the same data center and are read and written using the leader of that data center.
2. To converge to a consistent state:
- Single-leader databases write sequentially, and if there are multiple updates to a field, the last write determines the final value of that field.
- In multi-leader mode, there is no clear write order, so you can design a replication scheme to ensure that the data is ultimately consistent across all copies. The following are common schemes to ensure convergence to consistency:
- Write each one with a unique ID and pick the write with the highest ID as the winner (LWW, last write wins).
- Each copy is assigned a unique ID, and writes with a higher ID number have higher priority. I don’t understand photoshop
- Merge conflicting values together in some way
- Record conflicts in a display data structure that holds all information, and write application code that resolves conflicts (ps is like Git merge resolving conflicts)
Custom conflict resolution logic
As the most appropriate way to resolve conflicts may depend on the application, most multi-master replication tools allow you to write conflict resolution logic using application code.
- Execute on write: The database system detects a conflict in the replication change log and invokes the conflict handler.
- Read on execution: When a conflict is detected, all conflicting writes are stored. The next time the data is read, the multiple versions of the data are returned to the application. The application prompts the user or resolves the conflict automatically and writes the result back to the database.
4.4 Replication Topology
Replication topology describes the communication path that writes propagate from one node to another. The replication topology of the three leaders is as follows:
The most common topology is the all-to-all topology, where each leader synchronizes writes to other leaders, but this is limited; for example, by default MySQL only supports Circular Topology.
Note: The problem with Circular Topology and Star topology is that if one node fails, it may cut off the replication message flow between the other nodes, causing them to be unable to communicate until the node is fixed.
5. No leader
5.1 architecture diagram
Some storage systems take a different approach, abandoning the concept of a master library and allowing any copy to receive writes directly from clients. Leaderless implementations fall into two broad categories:
- The client sends writes directly to several replicas.
- A coordinator node writes on behalf of the client, but unlike the primary database, the coordinator does not perform a specific write order.
Note: Amazon’s Dynamo and dynamo-inspired Riak, Cassandra, and Voldemort are open source data stores for leaderless replication models.
5.2 Data is written to the database when a node fails
Consider: What are the disposal strategies for single leader and leaderless in a database with three replicas, one of which is currently unavailable?
- Single leader: Failover needs to be performed to add new replica nodes
- No leader: No failover is required. If two copies are written successfully, the write succeeds and one copy is tolerated as unavailable
Think deeply: What is the problem if the user reads data from a backward replica node?
- Data inconsistency may occur when the backward copy data is read, and the backward copy node loses data
Further thinking: How to fix the problem of replica data lagging?
- Read repair: The client reads multiple nodes in parallel, checks the copy for lag, and writes the new value back to the copy
- Anti-entropy process: The data store has background processes that constantly look for data differences between copies and copy the lagging data from one copy to another
Further thinking: under leaderless replication strategy, is there a rule for the quorum of read and write?
-
If there are n copies, each write must be confirmed by the W node to be considered successful, and at least r nodes must be queried for each read, where w + r > n. Read and write following these r and W values become a quorum, and r and W can be considered as the minimum number of votes required for valid read and write.
Note: If we follow w + r > n, we will not return stale values. Not necessarily. Things are so complicated and changing that what you think you think is what you think…
5.3 Detecting Concurrent Writes
Allowing multiple clients to write simultaneously means that conflicts can occur. This is similar to write conflicts for multi-led replication. Some conflict resolution methods have been briefly described in 4.3 “Handling write Conflicts”, but this issue is explored in more depth in the multi-leadership section.
Finally, victory (LWW)
Each copy only stores the “most recent” values and allows the “older” values to be overwritten and discarded. There is a way to determine which writes are “closest,” and then each write is eventually copied to each copy, so the copies converge to the same value.
Note 1: Writes have no natural order, but arbitrary ordering can be enforced, for example, by attaching a timestamp to each write.
Note 2: LWW can lose concurrent writes and is a poor option for resolving conflicts if the loss of data is unacceptable.
“Previously occurring” relationships and concurrency
Consider: judge the relationship between two operations and decide on a disposal strategy?
Answer: Taking two operations A and B as examples, there will be three relationships.
-
Operation A precedes operation B
-
Operation B precedes operation A
-
Operations A and B occur simultaneously
If one action occurs before another, the later action should override the earlier action. If these operations are concurrent, there are conflicts that need to be resolved. Such as LWW, merge writes, etc.
Capture “previously occurring” relationships
The figure below shows users 1 and 2 adding items to the same cart at the same time. The arrow indicates which action occurred before the other action, meaning that subsequent actions know or depend on the earlier action. In this case, the client never fully grasps the data on the server because the other client is also operating at the same time, but the old version of the value will eventually be overwritten without losing any writes.
The server can determine if two operations are concurrent by looking at the version number. This algorithm works as follows:
- The server reserves a version number for each key, increments the version number each time the key is written, and stores the new version number along with the written value
- When the client reads the key, the server returns all unoverwritten values as well as the latest version number. The client must read before writing.
- When a client writes a key, it must include the previously read version number, and it must combine all previously read values.
- When the server receives a write with a particular version number, it can override all values of that version number or lower, but it must keep all values of higher version numbers.
Merges values written at the same time
This algorithm ensures that no data is discarded, but unfortunately the client needs to do some extra work: if multiple operations occur concurrently, the problem must be solved by merging concurrent writes of values.
Version of the vector
The version vector approach is used to capture the dependencies between operations in the summary of “Capturing previously occurring relationships.” However, this approach is limited to using only one copy.
Consider: in the case of concurrent writes with multiple copies, how to capture dependencies between operations?
Answer: In addition to using the version number for each key, you need to use the version number for each copy (the set of version numbers for all copies becomes the version vector). Each copy increments its own version number as it processes writes, and keeps track of the version numbers seen in other copies. This information indicates which values to override and which values to keep.
6.
I thought I would write very quickly, but when I read carefully, I found that a lot of knowledge points summarized in this chapter are reflected in the actual business, so I copied most of the knowledge points over, anyway, finished writing or as usual end scatter flowers.
Oh, I saw the Iron Lady during the Qingming Festival holiday, and I was shocked after watching it. The difference between people is caused by one small choice after another, but most people don’t believe it. Take two sentences that make sense to you and put them at the end as a reminder.
-
It used to be about trying to do something.
People used to try to get things done,
Now its about trying to be someone.
Now it’s about trying to be someone.
-
Watch your thoughts , for they become words. Watch your thoughts, they become words.
Watch your words , for they become actions. Watch your words, they become actions.
Watch your actions , for they become habits. Watch your actions, they become habits.
Watch your habits , for they become character. Watch your habits, they become character.
Watch your character , for it becomes your destiny. Watch your character, it becomes your destiny.
7. Reference materials
- Designing Data-intensive Application, Chapter 5