Introduction: Combined with my work practice and some of my own thinking, I would like to share the online Excel collaboration plan with you today.
If you are interested in the topic of online documents, you can also read these two articles: How to implement an online Excel storage solution for online documents with multiple collaborators
scenario
Multiple users work on an Excel file simultaneously. The entities in the scenario are user and Excel. Users are classified into owner, reader, and collaborator. Owner: creates Excel. Reader: can view Excel. Collaborator: can edit Excel content
Creating a domain model
- An Excel has only one owner, but can have multiple readers and collaborators
- An Excel can be accessed simultaneously by multiple readers or collaborators
- An Excel can be edited by multiple collaborators at the same time
- An Excel can be deleted by the owner
Process analysis
Key processes of collaboration include: “User opens Excel” “User edits Excel” “User exits Excel” “User deletes Excel” In all key processes, the client needs to send messages to the server, and the server needs to broadcast messages to other clients. In addition, when users frequently modify Excel content, there will be frequent network transmission to ensure that the content modified by each person is synchronized to other clients in real time. It’s a lot like a chat room. In this scenario, long link is a more appropriate scheme, and WebSocket is one of the commonly used schemes to achieve long link.
Unlike chat rooms, chat rooms tend to be more AP models; Online Excel tends to favor the CP model, because missing or out-of-order messages can lead to errors in file content, with serious consequences.
Each of these key processes requires knowing how many people are reading and editing an Excel file. Keep track of the current online users of Excel so that you can broadcast changes to them as Excel content changes.
Excel Online Users
Data on how many people are currently collaborating is changing in real time and requires frequent and efficient access, so redis is a good place to store it. We can use the Hash type of Redis to store it, and the unique ID of Excel can be used as the Key to store information such as online users and opening time of files.
Hset excel_id user_id "Open time"Copy the code
Other storage types, or other redis storage methods, are ok.
State radio
After a WebSocket connection is established, the client maintains a long link with a copy of the server. When users open Excel or modify the Excel content, they need to search for online users in Redis according to the current Excel_id, and then send broadcast messages to synchronize the status change to all clients. There are three implementation schemes for sending broadcast messages in this scenario:
Solution 1: Exce_id route
All requests for excel services, routed according to exce_ID, so that all long links on the same Exce_id will be on the same copy. When a broadcast message needs to be sent, all the long links for the current exce_ID are on this copy without any special handling at the code level.
Advantages: Simple implementation, no intrusion into business code disadvantages:
- Dynamic capacity expansion cannot be performed. Even if a copy is added, the request for an exce_ID is still made to the original copy
- Load balancing is not friendly. If the number of exce_id users on a copy is too large, the link number of a copy is too large, and other copies may be idle
- If an Excel collaboration has too many people, it may cause the replica CPU or memory to be full. In other words, the maximum number of copies per copy determines how many people Excel can support online at the same time
- Unable to extract a separate WebSocket gateway
- Long links are inherently stateful, binding the state of the service to the copy, which is equivalent to magnifying the state
Plan 2: event broadcast
When a broadcast message needs to be sent, all copies of Excel obtain the online user from Redis based on exce_id and compare whether the user information exists in the Sessions that the current copy holds the link to. If it exists, a broadcast message is sent to the link; if it does not exist, it is ignored.
Notification is sent to all other replicas when there is a broadcast message, which can be implemented using message queues. Let all replicas subscribe to a channel and notify other replicas of broadcast messages via message queues. In addition to message queues, the interface of the cloud platform can be called based on the application ID to return the VIPs of all pods, and then send requests to all copies based on the VIPs.
Message queues are recommended to reduce the dependence on cloud platforms.
Advantages:
- Dynamic capacity expansion
- Decouple Excel from duplicates
- Load balancing is not affected
- There can be a separate gateway layer
Disadvantages:
- The need to introduce message queues increases the complexity of the system
- Intrusion into the business logic, the copy needs to determine whether the broadcast is sent by itself
- As a result, many invalid requests to Redis are sent frequently, which can cause great stress to the online environment (because one broadcast does not necessarily involve all copies).
Plan 3: registry, unified management, designated sending
The registry manages the long link relationship between Excel, users and copies. When a broadcast needs to be sent, the registry obtains all VIP /Host copies that need to be broadcast according to excel_id and invokes its service to push broadcast messages to the client.
Advantages:
- Dynamic capacity expansion
- Decouple Excel from duplicates
- Load balancing is not affected
- There can be a separate gateway layer
- Basically no intrusion into business logic
Disadvantages: The need to introduce a registry, increase the complexity of the system, increase the operation and maintenance costs
The key logic
The user opens Excel
When a user opens Excel, the user’s information needs to be synchronized to all clients that are reading or collaborating with the document. The interaction flow is as follows.
- The user opens an Excel file in a browser and sends a request to the server
- Find all online users in Redis based on excel_id
- If no data is found, no one is currently opening Excel, insert yourself into Redis, and no further action is required
- If you find data, add yourself to the current record
- Push a message to all ** “links” ** that open this document except yourself
- After receiving messages from the server, other clients display the profile picture of the login user on the page
- completed
User operation Excel
There are many types of operations that users can perform on Excel, such as changing cell content, changing row width, adding columns, merging cells, and so on. We classify all user operations on Excel into two categories: 1. “Modify cell content” 2. “Other operations”
Modify cell contents
We use mutex logic to modify the contents of a cell. The mutex logic consists of locking, unlocking, and sending contents.
Locking logic
- When the user selects a cell, the front end sends the selected information to the server
- The server retrieves the lock according to ** “Excel_id and current cell coordinates” **, successfully retrieves the lock and proceed to the next step; If the lock fails, return to the current user that the cell is being edited by user A
- The server obtains the current online user based on excel_id and broadcasts the event
- After receiving the broadcast message, other clients mark the user information of the operator on the right of the cell and prohibit the current user from operating the cell
- completed
unlocked
- When a cell loses focus, the client sends a message to the server, which releases the lock based on “Excel_id and current cell coordinates.
- The server obtains the current online user based on excel_id and broadcasts the event
- After receiving the broadcast message, the client removes the operator’s user information from the right side of the cell and allows other users to operate the cell
- completed
Content change
- When the user finishes modifying the contents of the cell, a request is sent to the server
- The server obtains the lock based on excel_id and current cell coordinates. The lock is successfully obtained and the next step is performed. If the lock fails, return to the current user that the cell is being edited by user A
- The server obtains the current online user based on excel_id and broadcasts the event
- After receiving the broadcast message, other clients re-render the table based on the broadcast content and the current table content
- completed
supplement
How to determine the lock success? If Excel_id and current cell coordinates do not exist, no user operates the cell and the lock is successfully obtained. If excel_id and current cell coordinates exist, you can use the user ID as the Value of the lock and compare whether the Value is the current user. If yes, the lock is successfully obtained and you can modify the cell content.
Set the default timeout period when locking to prevent cell contents from being frozen forever.
In addition, there is a gap problem: after a user selects a cell on the client, the gap time of “request to the server for locking and then broadcast to other clients” is long. During this period, if a user changes the content of the same cell quickly, the content will be overwritten or the modification will fail. Depending on our own business scenario using Excel, we can decide to allow the current situation to occur, or to handle it by optimizing the lock fetching logic.
Any other changes
For other changes, the logic is used to override the later operations and the earlier operations.
- When the user selects a cell, the front end sends the selected information to the server
- The server obtains the current online user based on excel_id and broadcasts the event
- After receiving the broadcast message, the client rerenders the table based on the broadcast content and the current table content
- completed
The reason for using override logic: Many user operations cannot be combined. For example, user A changed the height of the first row of the cell from 30px to 50px; User B changed the height of the first line from 30px to 40px. The program cannot set the height of the first row of cells as expected
User exit Excel
When a user exits Excel, that person’s information needs to be synchronized to all clients that are reading or collaborating with the document. To exit, click the rollback button in the upper left corner of the page, click the rollback button of the browser, or close the browser. In addition, users may log out due to abnormal network interruption. All logout operations correspond to the server, that is, the WebSocket connection is disconnected. The close event of the WebSocket server can be used to indicate the user’s exit. The interaction process is as follows:
- The server WebScoket is disconnected, triggering the close event
- The server obtains the current online user based on Excel_id. If no data is found, no one has opened this document. Delete the online user records in Redis. If you find data, delete yourself from Online User List and go to the next step
- Push a message to all links that open this document except yourself
- After receiving the message from the server, the client deletes the user in Online User List or marks the user as offline
- completed
User delete Excel
- The client initiated a deletion request. Procedure
- The server verifies whether the deletion permission is granted. If the deletion permission is not granted, the system returns that the deletion permission is not granted
- Find all online users in Redis based on excel_id.
- If no data is found, no one has opened the document. Delete the record in Redis
- If you find the data, push a message to all the links that open the document except you,
- The client prompts the user with a pop-up message indicating that Excel has been deleted
- completed
Existing problems
This scenario does not solve all of the problems in collaboration, and there are many things to note in addition to the considerations already mentioned above. For example: how to resolve the conflict of multiple operator operations when encountering merge function operations? What happens when someone modifies a cell and other users merge cells? Can the logic of multiple people modifying a cell at the same time be optimized? The issue of the messaging layer is particularly important and needs to be addressed separately:
- Because WebSocket messages are unordered, each of the above scenarios relies on message order, requiring additional safeguards
- WebSocket may fail to send messages. Is an ACK mechanism needed when the server communicates with the client?
- What if the ACK mechanism is established and the other party of the handshake happens to be offline?
- How can I ensure that the current user data is updated to the latest status when the connection is interrupted and re-established?
conclusion
Today detailed and everyone introduced, online Excel collaboration some implementation schemes and key processes, I hope to play a role. Students who like to collaborate online can come together to discuss.
Finally, ideal automobile is recruiting “full stack/front-end development engineer”. Students who love life and like to study technology are welcome to join us to realize their “ideal”. Interested students can public account private message me, or send resume to “[email protected]”.
We sincerely invite you to pay attention to the public account: Nhat hanzhou I will update technical articles every week, and learn and progress together with you.