background

Behind the scenes of each film, have retained your watch record, detailed remember you watched a few times, skipped those hours, it is said that according to these data can analyze which Japanese star you like, in order to do directional push……

Although it seems like a simple function, it actually involves a very large amount of data. In the limit case, it is the product of your users * the number of videos.

So in the case of only two web servers, one sqlserver, how to face such a small amount of data write requests? Why write a request? Because you need to record every second the user watches the video, for example, the tenth second the user watches the video. To make this feature work, you need to define a few things:

  1. A data definition that records how a user watches a video
  2. The data protocol used to interact with the client
  3. The data format recorded in the database
  4. How to solve the server write pressure (after all, the number of requests from a single server is still relatively large)

The solution

Definition of video viewing progress

For a video, if have 1 hours of time, the 3600 seconds corresponds to the 3600 state, whether they have been watching for view state, only watch and not watch the two states, so a bit is enough, a byte has 8 bit (byte), so a byte can represent 8 seconds to watch state, on this basis, The higher the base, the more states the same number of characters represent.

Every time the client uploads new data, it needs to perform bit calculation with the existing data on the server. For example, 01000 indicates that the client uploaded new data in the second second. 00011 means that the video has been watched in the fourth and fifth seconds. For users, this video has been watched in the second, fourth and fifth seconds. Although it is only a simple operation, the CPU consumption should not be underestimated when the amount is large.

First byte Second byte 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Bit: 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 binary: 0x88 0x40 Character string: 8840Copy the code
Protocol for interacting with clients

Only the client knows the real-time information of the user’s watching progress, and the client needs to upload the user’s watching progress data. The base system for interaction with the server can be hexadecimal with strong commonality. Of course, it doesn’t matter if you choose base 100, as long as both sides can support it at the same time and can parse it normally

Database data format

Each database supports different types of data, so I won’t go into too much detail here. Of course, no matter what format, the less space the better, but it should also be considered according to the amount of computation in the business.

To solve the problem

CPU Performance Issues

After all, it is necessary to combine the latest viewing data and old data of users each time, which can not be underestimated in the case of large number of users. The client uploads the hexadecimal data, converts it to decimal, and then merges it with the viewing record (base 10). This part cannot be omitted by CPU. The specific conversion procedure is as follows:

ConcurrentQueue<UserVideoInfo> AddQueue = new ConcurrentQueue<UserVideoInfo>(); Protected List<int> ConvertToProgressArray(string progressString) {if (string.IsNullOrWhiteSpace(progressString)) { return null; } // verify if progressString.length % 2! = 0) { return null; } var proStrSpan = progressString.AsSpan(); List<int> ret = new List<int>(); int i = 0; while (i < proStrSpan.Length) { ret.Add(int.Parse(proStrSpan.Slice(i, 2).ToString(), System.Globalization.NumberStyles.HexNumber)); ; i = i + 2; } return ret; }Copy the code
The number of client requests is incorrect

If 10,000 users are watching the video at the same time and the data upload interval is 2 seconds, that means 5000 requests per second. As a result of this business is just a business user log type, what is the log type, the means that can tolerate some of the data loss, according to the data form, the client can do in the local buffer first record, it is not necessary to upload a second record at a time, for example, now the client agreed upon in the 30 seconds to upload a record, if the user to turn off the client, The failed records will be re-uploaded during the next startup.

Database pressure

If each request updates the database separately, that’s up to 5,000 update requests per second, according to the second calculation. Every time the user watches the video, they load the cache in memory. Carefully analyze this kind of service. Because it is log data, every time you request, it is not necessary to update the database, but to update the cache first, and then periodically to update the database.

Due to the amount of data problems, all of the updates will be sent to a task queue, queue executives will be according to the configuration batch update database, it is much better than the single update database performance, actually this kind of scheme are used in a lot of log type business, batch updates to the database pressure is much smaller, similar to the following code

public async Task<int> AddUserVideoData(UserVideoInfo data, DBProcessEnum processType = DBProcessEnum.Update) { if(processType== DBProcessEnum.Add) { AddQueue.Enqueue(data); } return 1; } void MulProcessData() {// The number of updates int maxNumber = 50; List<UserVideoInfo> data = new List<UserVideoInfo>(); while (true) { if (data == null) { data = new List<UserVideoInfo>(); } try { if (! AddQueue.Any() && ! UpdateQueue.Any()) { System.Threading.Thread.Sleep(500); } else {// Start with data.clear (); while (data.Count <= maxNumber && AddQueue.Any()) { if (! AddQueue.TryDequeue(out UserVideoInfo value)) { continue; If (data.any (s => s.UserId == value.userId && s.videoid == value.videoid)) {var exsitItem = data.first (s) => s.UserId == value.UserId && s.VideoId == value.VideoId); exsitItem = value; } else { data.Add(value); } } if (data ! = null && data.Any()) { var ret = UserVideoProgressProxy.Add(data); } } } catch (Exception err) { } } }Copy the code

Write in the last

In fact, this kind of high IO operation with SQLServer this relational database is not good, Nosql in this simple high IO situation to a lot, can be changed to redis try, estimate will be much better than SQLServer.

More interesting articles

  • Distributed large concurrent series
  • Architectural Design Series
  • Series of interesting algorithms and data structures
  • Design Pattern series