background

In A recent project, user A reported that after editing some content in the background, he re-entered the page after A period of time and found that the content was still the version of A long time ago, and his saving did not take effect. After searching the log, I found the reason that several users had edited the content during the period. When user B clicked save, the content of his page was very early, which covered the version edited by user A.

At present, the front-end and the server have discussed the solution to this problem, but I am interested in this topic, so I have carried out a more in-depth investigation.

Option A: Edit lock – disables multiple edits

The first idea to solve the above problem is that since multiple editors at the same time may lead to content overwriting, I will not allow multiple editors at the same time. When someone is editing a document, the system will lock the document to prevent others from editing it at the same time. This method is commonly known as edit lock.

The idea of editing locks comes from pessimistic locks in relational databases. Pessimistic locks, as the name suggests, are intensely exclusive and exclusive. It assumes that when the current transaction manipulates a data resource, there must be other transactions accessing the data resource at the same time. In order to avoid interference with the current transaction’s operations, the resource is locked first, and other transactions cannot access the resource until the lock ends.

Edit lock implementation ideas

When user 1 enters an edit of content, it sends a request to the server, which marks the lock of that content as user 1. When other users access the content, the system displays that user 1 is editing the current content and cannot access the content until user 1 is unlocked.

In order to improve efficiency and prevent a user from locking a resource for a long time, the following methods can be used to optimize the edit lock:

  • When user 1 locks a resource, other users do not have editing permission but can access it.
  • If a user has a resource locked, he/she must send a request to the server periodically to renew the resource lock. If the resource is not sent within the interval, the resource is unlocked automatically.
  • When user 2 accesses the resource, if the resource is locked by user 1, user 2 can transfer the lock permission to user 1. If user 1 agrees, the server changes the lock permission from user 1 to user 2.

Edit the pros and cons of locks

Edit lock can have only one editor features at the same time, can effectively avoid the user because the version covers and lead to ineffective work, at the same time because its implementation is relatively simple, so is the most widely used in many collaborative editing scenes a scheme (considering the implementation costs, our project will eventually take the compromise solution).

However, edit lock cannot be edited by multiple people at the same time, which is still not user-friendly in terms of user experience.

Plan B: Allow multiple editors to prevent overwriting

Version Change Notification

In the background mentioned above, the main reason for the problems of our project is the overwriting of content by users without their knowledge. Therefore, we can add a version change prompt function: when users edit content, if other users submit a new version in the process of editing, the page will give a prompt about the version change and whether to cover it.

Version changes prompt implementation roadmap

The implementation of version change notification is very simple. When the user retrieves the content, the server also returns the current version number of the content. When the user saves, the modified content and the obtained version number are submitted together, and the server verifies whether the version number is consistent with the version number in the database:

  • If yes, the version is not changed during the editing process and is saved successfully
  • If they are inconsistent, it indicates that a new version change occurs during the editing process of the user. The saving fails and the user is notified. The user can choose whether to overwrite the version

Insufficient version change prompt

Version change prompt This scheme only avoids the version coverage problem that users do not know, but in the end, they can only choose to do version coverage or abandon the current version, which cannot solve the conflict problem in multi-party collaboration.

Optimization: version merge

In the version change notification scenario, ultimately we can only choose to keep the current edited version or the latest version in the database. Kids make choices. As adults, of course, we want them all.

Then we can use version merge to keep the changes for each version. Speaking of version merge, as a developer, we can easily associate git version merge. Git merge is actually a diff-patch process. Diff and Patch are a pair of tools. Diff can compare and record the differences between the two contents, generate a patch according to the differences, and then apply the patch to other contents to update the contents.

In our collaborative editing scenario, when the user changes the file content from version A to version B, the patchBA of version B and version A will be sent to the server when saving, and the server will obtain the latest version X of the file in the database, and then apply the patchBA to version X to realize the update.

The biggest problem in the above diff-patch process is that conflicts may occur when patchBA is applied. If there are conflicts, version X cannot be directly updated. At this point, the server needs to send patchXA to the user, and the user manually resolves the conflict between patchBA and patchXA and merges it. After the conflict is resolved, a patchBX is sent to the server to finally realize the update.

Row-based diff

Common text diff is generally divided into line-based diff and character-based diff. For example, git is commonly used as line-based diff.

For example, now there is the following set of raw data A, and then user 1 changes xiaoqian’s mobile phone number based on version A. The data are as follows:

const versionA = 'Xiao Zhang 18866277777= Xiao Wu 12233333111 Xiao Qian 19277788888 Xiao Ye 12111111222';
// After modification:
const versionB = 'Xiao Zhang 18866277777 Xiao Wu 12233333111 Xiao Qian 18888888888 xiaoye 12111111222';
Copy the code

In javascript, we can use the NPM library diff for line-based content diff and patch operations. CreatePatch: the first parameter is the file name, the second parameter is the content before modification, and the third parameter is the content after modification. Finally, the two versions of patch are returned:

import { createPatch, applyPatch } from 'diff';

const patchBA = createPatch('data', versionA, versionB);
console.log(patchBA); // The output is as follows
Copy the code

The patch result output above is as follows, with – in front of it represents the row to be deleted, and + represents the row to be added:

Index: data
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
--- data
+++ data
@ @ - 2, 6 + 2, 6 @ @Xiao Zhang 18866277777 Xiao Wu 12233333111- money 19277788888
+ money 18888888888Lobular 12111111222Copy the code

So let’s say that user 2, based on version A, changes the content to the following version C, and adds A small black line of data in the middle. We cannot directly save version C as the final result, but need to get the latest remote version B and apply patchCA to version B to obtain the final result:

const versionC = 'Xiao Zhang 18866277777 Xiao Hei 19222221111 Xiao Wu 12233333111 Xiao Qian 19277788888 xiaoye 12111111222';

const patchCA = createPatch('data', versionA, versionC);
let result = applyPatch(versionB, patchCA);
console.log(result);
Copy the code

The final result is output as follows, you can see that the modification of the small money line and the added small black line appear in the final result:

Xiao Zhang 18866277777 Xiao Black 19222221111 Xiao Wu 12233333111 Xiao Qian 18888888888 Xiao Ye 12111111222Copy the code

But row-based algorithms are prone to conflict. For example, enter the following information, including name, age, mobile phone number, and province. User A filled in his name and age, and user B filled in his mobile phone number and province:

const data = 'name:; Age:; Mobile phone no. :; Province :; ';

const dataA = Name: Xiao Zhou; Age: 23. Mobile phone no. :; Province :; ';
const dataB = 'name:; Age:; Mobile phone no. : 18866668888; Province: Beijing; ';
Copy the code

If you now merge dataA and dataB in row-based diff, because one has only one row of data, you are bound to clash. But they can actually be combined into ‘Name: Xiao Zhou; Age: 23. Mobile phone no. : 18866668888; Province: Beijing; To solve this problem, we can use more fine-grained character-based diff.

Character-based diff

In fact, the diFF library mentioned above also has diFF based on character granularity, but it lacks patch encapsulation based on character granularity. We use the diff-match-patch library to conduct diFF based on character granularity.

import DiffMatchPatch from 'diff-match-patch';
const dmp = new DiffMatchPatch();

const patchA = dmp.patch_make(data, dataA);
const result = dmp.patch_apply(patchA, dataB);
console.log(result);
Copy the code

Result is an array, the first item of the array is the merged result, and the second item is an array of whether patch is successful:

[Name: Xiao Zhou; Age: 23. Mobile phone no. : 18866668888; Province: Beijing; '[true]].Copy the code

Therefore, it can be seen from the results that the character-based diff-patch effect is much better than the line-based diff-patch effect.

Inevitable conflict

Personally, I have conducted some other sample tests of text based on the character granularity of DIFF-patch. In some cases, there will still be conflicts, but in the face of conflicts, diff-match-patch will automatically select some versions and reserve them, which may lead to the loss of part of our content:

import DiffMatchPatch from 'diff-match-patch';
const dmp = new DiffMatchPatch();

const data = 'Basketball, football';
const dataA = 'Baseball, football, badminton';
const dataB = 'Billiards, football, table tennis';

const patchA = dmp.patch_make(data, dataA);
const res = dmp.patch_apply(patchA, dataB);
console.log(res);
Copy the code

For example, in version A, basketball is changed into baseball, and in version B, basketball is changed into billiards. In theory, there will be conflicts between patches. But the final res output is as follows, and diff-match-patch directly discards billiards and keeps baseball:

['Baseball, football, badminton, table tennis'[true]].Copy the code

Further optimization: collaborative editing

As we know from the above version merge, the biggest problem is version conflict. Combined with our experience in using Git, we can actually find that the larger the number of version differences between the version we edited and the latest version of the database, the more likely there will be conflicts. Naturally, we can assume that as long as the version we are currently editing is as consistent as possible with the latest version of the database, there will be no conflicts. This brings us to the focus of this article — collaborative editing.

In the pursuit of high user experience scenarios, such as various online documents, almost all adopt collaborative editing solutions to ensure real-time online editing by multiple people. In order to achieve collaborative editing, several key technical points are mainly needed:

  • Diff algorithm for incremental transport: Two commonly used techniques in collaborative editing are OT(Operational Transformation) and CRDT (conflict-free Replicated Data Type)
  • Real-time update of documents: WebSocket or polling can be used, but for performance and experience we usually choose WebSocket
  • Rich text editor for updating content: This is optional. Usually multiplayer online editing scenarios need to support rich content editing, so a rich text editor is required. Normal text editing scenarios do not need a rich text editor.

OT

OT is a technology mostly used in collaborative editing. Operational Transformation, as its full name in English, consists of two steps: converting user editing behavior into enumerable operations (Operational); If there are multiple operations going on at the same time, the operations are transformed.

In the field of collaborative editing, OT technology is widely used. Some excellent online documents, such as Feishu Cloud Documents and Google Doc, all use OT, which can be said to be time-tested, so it is worth learning.

Starting from Operational, taking the simplest string data as an example, the modification of string can be divided into the following three types of operations:

  • Retain (n) : Retain n characters without change
  • Insert (STR) : inserts the string STR
  • Delete (STR) : Deletes the string STR
I had three baked sweet potatoes todayCopy the code

For example, the above text changes actually go through the following operations in Operation of the OT algorithm:

delete 'I want';
insert('today');
retain(1);
insert('Three');
retain(3);
Copy the code

When multiple users perform operations at the same time, conflicts may occur when operations transmitted to the server are synchronized to each client.For example, user A and user B operate on the same text at the same timeabcd:

  • The front-end display of user A must have applied its own optA first, and the text content becomesabcdeAnd then optB from the server is applied, and the text content becomesabcdfe
  • User B first applies his own optB, and the text content changes toabcdfAnd then optB from the server is applied, and the text content becomesabcdefThe changes displayed by end users A and B conflict. To solve the above conflicts, we need a transformation algorithm transformation to transform operation. There are two commonly used conversion algorithms according to different conversion perspectives:EasySyncundo .

Bilateral conversion based on EasySync

EasySync is A kind of bilateral operation conversion, that is, in the above case, user A and user B will stand in their own perspective, each performs their own operation first, and then each performs the operation of the other side after the transformation of the server, and finally get the same result. That is:

One of thetransform()The results are as follows:

transform(optA, optB) = () = > {
  retain(4);
  insert('f');
  retain(1);
}

transform(optB, optA) = () = > {
  retain(5);
  insert('e');
}
Copy the code

In the EasySync algorithm, if the initial state ABcd is denoted as state O, the following formula should always be established:

O -> optA -> transform(optA, optB) === O -> optB -> transform(optB, optA)
Copy the code

Undo-based unilateral conversion

Undo is A unilateral conversion algorithm, that is, no matter from the perspective of user A or user B, the operation applied is the same, so as to ensure the consistency of results.

For example, if optA is first transmitted to the server, user A applies optA first, and then applies Transform (optA, optB) after optB is also transmitted to the server. After user B applies optB locally and sends it to the server, user B finds that optA changes the content first. Then user B performs the undo operation to change the status back to O, and then performs optA and Transform (optA, optB) to keep the content consistent with user A. The flow chart is as follows:

Transform (optA, optB) is equivalent to the following operations:

retain(4);
insert('f');
retain(1);
Copy the code

Ot-based open source library

In the implementation of ot-based online editing, we can find the following difficulties:

  • Operations are created based on changes in the front-end content

  • The server transforms the corresponding operation

  • Github has a number of open source libraries that can help us achieve this quickly. However, the difficulty is that the operation used by the front-end and the server needs to be consistent).

  • Etherpad: An online editor based on the EasySync OT implementation that encapsulates all three of the above technology points. If you are an individual developer, you can use the centralized WebSocket provided. If the company uses the WebSocket service, you can deploy the WebSocket service on the service sheet according to the guidelines to ensure data security. Experience online: video.etherpad.com/

  • Ot.js: an OT implementation in string format. In the preceding OT use cases, operation of the ot library is used to explain. It encapsulates the operation and transform operation created according to the change of the string. The disadvantage is that it needs to manually implement WebSocket and related front-end and back-end communication logic, and only supports the string format. Teaching demo: github.com/Operational…

  • ShareDB: Supports multiple operations and transforms such as strings, rich text, and JSON. It is widely used if you want to achieve OT collaboration for your own products due to the rich conversion formats supported. Teaching demo: github.com/share/share…

CRDT

CRDT (conflict-free Replicated Data Type) is mainly applied in distributed systems to ensure Data consistency of distributed applications. Collaborative document editing can be understood as a kind of distributed application, and its essence is data structure. The design of data structure ensures the final consistency of concurrent operation data.

CRDT was developed much later than OT, so there are relatively few mature products in the multi-person collaborative editing scenario, but there are some applications, such as Teletype for the Atom editor, PingCode Wiki, and Figma for collaborative editing.

Core idea of CRDT

As mentioned above, CRDT is mainly applied to distributed systems, so its data operation needs to conform to commutability and idempotency, which has solved the following problems:

  • Inconsistent order of sending and receiving due to network problems (commutability)
  • And multiple send (idempotent). So in our editing scenario, first of all, we want to make sure that the operations are interchangeable, so we just need to know the order of all the operations, and then we can sort the operations. We can sort the operations by their timeStamp. Each user has a UID. If multiple users have the same timeStamp, we can carry out ascending order according to the UID to ensure the sequence of concurrent operations.

At the same time, the combination of UID and timeStamp ensures that each operation has a unique ID, which can realize idempotence and solve the problem of sending multiple unified operations.

CRDT tree structure

For example, in CRDT, an action identifier is created for each character. If the identifier of D in the initial state abcd is UserO@T4, then both user A and user B operate based on the action identifier:

UserA@T5: insert('e') at UserO@T4
UserB@T6: insert('f') at UserO@T4
Copy the code

So these two operations only need to ensure their order, without the need for conversion to ensure the consistency of the final modification results, the actual operation data structure is as follows:

From this operation structure, we can see that in order to ensure the implementation of CRDT, the database needs to store the identifier of each character. At the same time, when there is a new operation, we need to traverse the tree structure of the operation to find the node associated with the current operation.

Crdt-based open source library

  • Yjs: the most well-known CRDT framework in the community, optimizes the creation of Yjs structured objects from a V8 perspective. The whole idea is to make the process of creating Yjs objects optimized by the browser, whether it is memory usage or object creation speed. Other YJs-based frameworks include SyncedStore, etc.

OT is compared with CRDT

OT algorithm has been relatively mature due to its long development time, but many people in the community are optimistic about the application of CRDT in multi-person collaboration scenarios, and believe that CRDT will be more promising than OT in the future. For now, the comparison is as follows:

The framework advantage disadvantage
OT 1. High performance

2. The operation intention can be saved

3. The document size is not affected
1. Centralized servers are required

2. Complex algorithm design

CRDT 1. Decentralization

2. The algorithm design is relatively simple

3. High stability
1. Compare memory consumption and performance

2. Loss of user operation intention

conclusion

This paper summarizes a variety of solutions to content coverage in multi-person editing scenarios. We can choose different solutions for different scenarios.

  • If you just want to solve the content coverage problem and there is no requirement for multiple people to collaborate, then edit lock is recommended
  • If there are multiple collaboration requirements, but the real-time requirements of the content are low, then the version merge solution can be considered
  • If you want to achieve real-time collaboration among multiple people, you can only consider using OT or CRDT for collaborative editing.

reference

  • The evolution of multi-person collaborative editing technology
  • Implementation of real-time collaborative editing
  • What is a CRDT
  • Application of OT algorithm in collaborative editing