We are honored to have Mr. Lin Zhengxian, the chief architect of 5G of Huya Live, to introduce the mistakes and opportunities of 5G low latency. This paper begins with the introduction of the low-delay principle of 5G, and gradually solves the 5 misunderstandings of the public on 5G low-delay. Finally, it shares huya Live’s ideas on the construction of low-delay deterministic network technology and the application of 5G in other scenarios.
Hello everyone, my name is Lin Zhengxian from Huya Live. Today WE are going to talk about some issues related to 5G low latency. I want to talk about this topic because when Huya was doing some 5G practice, it found that there was a big difference between engineering data and propaganda data and media data. We did a lot of exploration and analysis. Today we want to share what we think, think and see with you.
We want to talk about what 5G low latency is first? How it is achieved, I will analyze the misunderstanding of 5G low latency; Others include MEC (Multi-access Edge computing), which is closely related to 5G; There are also some practices in 5G low latency or MEC; Finally, let’s talk about Huya’s thinking on 5G low latency and the challenges and opportunities in the future.
Some daos on 5G
In the past year or two, 5G has been a hot topic, and many media have made a lot of reports. The graph on the left tells us that Wi-Fi has poor bandwidth when there are a lot of people. 4G is ok, but 5G is very fast. Another 5G application is remote driving, which promises to take advantage of 5G’s ultra-low latency and drive a car thousands of miles away without leaving home; Others say 5G could greatly improve download speeds. The above reports, true or false, you will have your own understanding and cognition after listening to my explanation today.
Definition of 5G delay
What is the low latency of 5G? 5G low latency is very clearly defined in 3GPP: the standard value of up and down lines for URLLC (high reliability, low latency communication scenarios) is 0.5 ms (RTT 1 ms). For eMBB (enhanced Mobile Broadband scenario) the latency of the upper and lower lines is 4 ms each. We usually surf the Internet, live, brush video and so on is almost eMBB service.
Why can 5G achieve low latency
How does 5G achieve low latency? In terms of wireless, that is, from the mobile phone to the base station, what measures have been taken? What is the difference between 5G and 3G and 4G inside the core network? So let’s talk a little bit about this diagram. Whether it’s 5G, 4G or 3G, if we haven’t sent data before, our phones have to uplink data to the public network or the base station, which is similar to raising your hand to answer a question in class. First to the base station to apply for an uplink wireless resources, base station according to the current load reasonable allocation, and provide a can send the time and wireless resources back to the user, then the user to complete data transmission. This is a periodic window of time to raise your hand, it’s not always on call, and we call this — scheduling requests. Scheduling requests introduce latency, which accounts for a significant proportion of the total wireless latency. 5G requires low latency. We remove the scheduling process through configuration. It can be thought that as long as data arrives, it can be sent directly in the configured time window without additional application. The other is Slot. 5G supports mini-slots. In 4G and 5G, a Slot can be divided into 7 or 14 symbol, a symbol corresponds to a sine wave wavelength period. For 4G, the subcarrier bandwidth is 15 KHZ, so a symbol is 1/15 ms and the entire slot is about 1 ms, which is the basic unit of 4G scheduling. In 5G, too, Slot is always a basic dispatch unit. However, with 5G, the subcarrier bandwidth can be even wider, up to 30khz, 60khz, 120khz or even 240khz, and the length of a symbol will decrease proportionally, so the overall scheduling cycle can be shorter. So in extreme cases, the full scheduling cycle of 5G can be as low as tens of microseconds. Furthermore, 5G can support mini-slots, which can be used as the basic unit of scheduling with only two symbols, so that the scheduling cycle can be controlled at the microsecond level. On the other hand, when using the eMBB service, because the scheduling unit is 1 millisecond, I am sending the eMBB data when the URLLC (higher priority) data comes, then I can pause the eMBB transmission, in the original wireless resource channel, the URLLC data is sent first, this is called preemptive scheduling. This ensures that higher priority data can be sent faster. The final figure depicts the reliability of the transmission. Low delay is closely related to transmission reliability, and the unreliability of transmission leads to packet loss and retransmission, which objectively leads to the increase of delay. The practice of 5G is redundancy. A packet makes multiple copies and sends the same data through different subcarriers and different channels. Here, different carriers may be two carriers belonging to the same base station, or even different carriers of two base stations. In addition, in extreme cases to ensure reliability, it may be necessary to sacrifice part of the efficiency of wireless, using more anti-jamming modulation mode to do the transmission. For example, I could have used 16QAM modulation, but for stability, I chose QPSK. These measures are largely in the air.
On the core network, starting with 3G, the mobile network will pass through the gateway (the data gateway between the mobile network and the public network) to access the public network. In WCDMA we call it GGSN, in 4G we call it P-GW, and in 5G we call it UPF. Although the names are different, they are both gateways. When 3 g or 4 g, gateway in each province in 1, 2, a local deployment, such as big probability is deployed in the southern province of guangdong in guangzhou or shenzhen or dongguan, other places even if access to the local server, also can through the base station, through the core network and just mentioned gateway, around a circle, then back to the original server process is very tortuous. But 5G can sink the gateway closer to the base station where we can access it, even in the same machine room, what we call a UPF sink. Look at the green line on the right, we go through the base station, the base station connects to the local UPF, directly to the man, and then we can access the local server. You don’t need to go all the way to Guangzhou like the yellow line, through the backbone network, and then to the man to access the last. The operators have a strategy for how to do this, and they do it through upstream shunt, which we won’t go into detail here. It sounds like 5G does a lot of things in terms of low latency, and it feels right, and the low-latency effects of 5G are being talked about. However, it is a pity that many people have some misunderstanding when discussing this topic.
Misunderstanding #01: Theoretical extreme delay is confused with actual delay
First, it is easy to confuse the theoretical delay of 5G with the actual delay. Especially, friends in the media are making some guidance intentionally or not.
Let’s take a closer look at the process of “scheduling” mentioned earlier. When we have an item of data that is about to be generated on the phone, we can’t send it immediately, we need to wait for a periodically configured “schedule request” (SR) window (e.g. 1 ms, 80 ms), After sending the SR request, the base station will determine when to grant to the phone based on the priority of the user, how occupied the air is, and so on. Maybe just enough for users to send a BSR (buffer status report), mobile phones will tell base station now has 20 k data needs to be sent, receipt of base station know you have a lot of data will be more channel is granted for users to send, this is the talk of a happens, most will experience, whether it is 4 g or 5 g. The scheduling free 5G simply eliminates the process of applying for SR, but it is unlikely to be possible for eMBB services because of the high cost involved. The longer the upstream scheduling request cycle, the larger the delay, because the average delay introduced by this process is half of the SR cycle. In good luck, the delay is faster, and in bad luck, the request has to wait for the next cycle. If our cycle configuration is small enough, our latency is still guaranteed. In theory, even 4G can be delayed by milliseconds.
Unfortunately, although the 4G delay data can be up to a few milliseconds in theory, our MEASURED DATA of 4G is not ideal. In huyas peak period, the RTT of 4G users to the same-city server exceeds 40 milliseconds, while the RTT probability of Wi-Fi users to the same-city server at the same time is less than 20 seconds. Last year, a white paper listed many 5G to B scenarios. After 5G sinks with UPF, the basic end-to-end delay is between 16 and 20 milliseconds, which is similar to the data of our current 5G network. Why is there such a big gap between reality and ideal? When we talk about delays, we’re talking about processing delays, queuing delays, sending delays, and propagation delays. The processing delay refers to the delay of verifying the validity of packets. The delay is negligible. The propagation delay depends on the speed of light, from A to B, transmitted through optical fibers, at A speed that we can’t interfere with, but the distance is short enough to be negligible. The remaining problem lies in the queuing delay and the sending delay, which we will expand in detail later. When doing general theoretical analysis, we tend to ignore them, which is the main reason for the gap between reality and ideal. There is also the problem of packet loss and retransmission. In many cases, it is not the network RTT that is bad, but the occasional packet loss leads to retransmission, which makes the actual delay higher.
Misconception #02: Different technologies are confused with different application scenarios
The second myth is also common in the media. There are many different scenarios and technologies for 5G, but when it comes to reporting, it’s always about the leading technologies.
For example, 5G has three application scenarios: eMBB (Enhanced Wireless Broadband) for high bandwidth, URLLC for high reliability and low latency services, and AN mMTC for massive machine-type communications for wide connectivity to the Internet of Things. Typically, only scenarios such as autonomous driving require URLLC’s extremely low latency, which can be as low as 2 or 3 milliseconds end-to-end. However, one thing cannot be ignored. In order to obtain ultra-low latency, it needs to pay a considerable price. Frankly speaking, it is not necessary for many scenarios to have extremely low latency. Because costs vary so much, we can’t cover low latency in all scenarios. In addition to costly redundancy, modulation and other wireless measures, as well as low end-to-end latency also requires very stringent end-to-end QoS guarantee.
What is the QoS of 4G? There are two kinds of bearer in 4G data transmission, one is the default bearer and the other is the proprietary bearer. Default hosting data is possible to establish. Proprietary hosting establishes additional channels for streams required by different QoS. The load itself is built at a limited cost. In addition, its granularity is relatively large, because a phone can usually only establish a few to a dozen proprietary hosts. Therefore, we rarely hear about the use of 4G QoS in daily life, because it tends to only appear in the internal business of operators, such as VoLTE to do some high priority assurance, but now operators are slowly opening up.
5G QoS is relatively more flexible, we can set different PCC rules according to different streams, corresponding to different packet billing and quality assurance rules. It is based not on hosting but on streams, which are often identified by triples or other characteristics. The figure above is different from other 5G QoS maps because it has QoS involving multiple access technologies. Multi-access. For example, in 3GPP AN, 5G can not only connect to the target address through the air port, but also form multiple Access channels through non-5G media, such as Wi-Fi, so as to ensure the reliability of data Access. However, after all, 3GPP represents the interests of operators or equipment vendors, so this feature may not be widely used. For example, we usually consider using Wi-Fi, 4G, 5G or MP-TCP to multi-channel transmission at the same time when we do transmission, which is usually implemented at the application layer. 3GPP is trying to encapsulating this approach at the bottom, but from my discussions with other app vendors, it seems that no one is willing to pay for this approach, and they prefer to take control of this aspect themselves. Another correlation with QoS is slicing. I won’t go into detail on this because it’s too much. The purpose of slicing is to ensure different priorities for different businesses, but the technology is a “pit” for phones, as current 5G phones have very little URSP support. Although 3GPP is defined, in the short term, this technology can only be used in the to B scenario, such as the access of different slices according to different DNN. For current mobile phones, it is difficult to switch from one slice to another: in theory, we can map different apps or different streaming characteristics to different slices to enjoy different QoS guarantees, but currently the terminal does not support this technology.
Myth #03: The empty port delay is confused with the whole network link
A more common misconception is that when we talk about 5G’s low latency, we’re really talking about its empty latency, but many people confuse it with the end-to-end latency of the network, or even the end-to-end latency of the business.
When we talk about 5G low latency, we’re talking about low latency from the phone to the base station. The end-to-end process will pass through the air port to the bearer to the core network, from the core network to the public network, after several IDCs may change to the core network of another operator, and then to the access network finally to the air port. Frankly, even if 5G’s empty ports can achieve zero or a millisecond delay, they will at best even the gap with fixed networks. If you use the cable access, in fact, the air port this link is no better than. What we really need is low end-to-end latency. What about the rest?
We can refer to TSN (Time-sensitive network), which is a network that will be used in the industrial Internet. Through this network, we hope to get 5G to achieve an end-to-end, highly stable and ultra-low latency scheme. The core point — redundancy. As can be seen from the figure, when the controller wants to control the remote device, 5G network is added in the middle. In the END STATION, we are divided into two channels, which connect to two terminals and transmit through different transmission channels. In 5G, there are a lot of similar multi-link, multi-path processing, which is worth learning from the way, due to time reasons, I can not expand more. I want to emphasize that low end-to-end latency is more important than low latency across the entire gap, and we can look to the industrial Internet for inspiration.
Misconception #04: No-load delay is confused with load delay
Another misconception is that people confuse the delay of no load (when the network is relatively idle) with the delay of load or even heavy load.
For example, during the holidays, when our freeway intersections are heavy with traffic, we don’t expect to be able to travel quickly. In the network this corresponds to queue latency, queueing for resources, waiting for the last user to empty the buffer, and that’s a big source of our latency. For wireless users, everyone wants to upload data, and as a base station needs to dispatch data, there will be queues. Unless my “lane” is extremely wide (much), there are enough lanes for everyone, but that’s impossible to achieve. For example, if you’re interested in wireless networking, there’s a lot of places you can test it. The most classic is at subway stations, especially during rush hour, such as Xierqi subway station in Beijing. If this station is measured at peak time, the RTT of the same city may be 200,300 milliseconds. Even in an empty state, measured early in the morning or in the middle of the night, the delay was 80 milliseconds. It has to do with how the base station is set up. Going back to the “scheduling cycle” mentioned earlier, usually the operator will adjust this value based on how busy the base station is and how busy the base station is. If I need to ensure that a lot of people can access the base station at the same time, I have to have more users in the limited control channel. In this way, the scheduling cycle must be extended so that each user can have access. My guess is that the operator’s scheduling period at that point is set to be no less than 80 milliseconds, and then the average RTT has an extra 40 milliseconds of introduction, which is also a very interesting phenomenon in subway stations.
Misconception #05: Ignoring the effect of bandwidth on latency
The last misconception is that we tend to overlook the effect of bandwidth on latency. For example, when we are processing small packets, the latency is still very low, and once the business is loaded, the latency is 15 milliseconds or even 40 milliseconds, 100 milliseconds.
Many of your peers here should be doing video. In addition to the infinite GOP scene, there are I frames, followed by P frames, and maybe B frames when coding the video, which cycle in turn. An I-frame is large, possibly many times larger than a P-frame, depending on the parameters set at encoding time. Assuming that the average bit rate of the available bandwidth is 10 megabytes, if you give us a smooth stream of 10 megabytes, we can get it out really fast, but video is a burst, especially for I-frames. In the end, we will find that the transmission of the I frame will take a lot of time and even hinder the transmission of the next frame, B frame or P frame. If I upload a 10 megabit video stream, I frames may reach 200kb. Even if I use 100 megabit bandwidth to transmit, it may take 16 milliseconds. What is this concept? In a cloud game, assuming the frame rate is 60, the interval between two frames is 16 milliseconds. That means I need 100 megabytes of bandwidth for a 10 Megabyte stream of video so that it doesn’t affect the next frame. That’s why WE say 5G is a great advantage for cloud gaming, because large bandwidth brings low latency. It’s hard to get a lot of downlink with 4G, and a lot of cloud gaming vendors are using 20 or 25 megabits of bandwidth right now.
Low latency and “5G+ Edge computing”
Now we can put all this together and talk about MEC. In the narrow sense of MEC we became mobile edge computing, but then 3GPP decided it wasn’t cool enough and changed M to multi-access to broaden its scope beyond just mobile. The original idea was to use UPF sinking (narrowing the distance between the egress gateway and the base station), where computing and storage resources could be placed in the UPF room, with the advantage of low latency when the distance to the end user is very short. If computing resources are placed in the UPF room and the central room, the delay difference between the two is quite large, especially the central room may not be in the same province as the UPF room.
Huya has made an attempt in edge computing. We hope to make some cartoon style transformation for the image of the anchor during the live broadcast. However, we found that the performance of livestreaming equipment of many anchors is not particularly excellent, especially some low-end or mid-end mobile phones, so it is difficult to make some difficult AI style conversion on mobile phones. That’s when we thought, can we take the computing part of it to the cloud and use low-latency technology to do that. The general process is as follows: the head of the anchor is sent to THE CMOS through the lens and then processed by the ISP. Then it is sent to the APP, which does some pre-processing, then codes and sends it over the network. The AI node in the edge room must decode first, and then do some corresponding AI processing. At this time, a stream must be recoded and then distributed to the audience. Another stream is returned to the anchor, who can see his transformed image after decoding and rendering after receiving the stream. This process, for the host’s feelings, is like a processing done on the phone. It’s a very nice idea. The delay felt by the anchor is steps 1-4 on the figure. So what stage do you think we have the highest latency in each of these stages, capture, encode, transmit, decode, render? It’s the acquisition phase. On android phones or iphone11 or below, if you hold the camera to yourself, the sensitive person can feel the delay, even if your own camera is doing local preview, the delay is between 80 and 120 milliseconds. The operation of the phone is nothing more than imaging, the imaging is read from the sensor, and then sent to the ISP for processing, and then sent to render. The android camera’s pipeline depth and overall architectural handling determine the latency, which was overlooked at the time. In some scenarios, the network is not necessarily the bottleneck. The network can achieve very low latency, especially in transmission, RTT20-30 ms can be achieved. For a typical 1080P phone code it takes 30 to 40 milliseconds, but the capture is likely to take up to 3 frames. So when we’re trying to do edge computing, we want to think is our bottleneck really on the network?
The difference between 5G and Wi-Fi
Now I want to talk about Huya’s thinking about 5G with low latency. We’ve talked a lot internally about the relationship between Wi-Fi and 5G. If you look at the whole network data you’ll see that there are two channels for users, one is Wi-Fi, one is 4G, and frankly most of the traffic is still coming from Wi-Fi, from home broadband. For mobile 4G users, they are also less likely to use the ultra-low latency services offered by carriers. Even in 5G, we will probably not use URLLC, nor MTC, the Internet of Things private network, we will use more eMBB business. In the air, eMBB has an RTT of 8 milliseconds, compared with 2 milliseconds for better Wi-Fi. Poor quality Wi-Fi in the market is due to the use of 2.4g band (susceptible to interference) or Wi-Fi 4 or poor AP gateway. Frankly, if everyone in this room could upgrade the Wi-Fi of everyone on your platform, I believe we could reduce the number of video jams. The theoretical delay of eMBB is higher than the actual delay of Wi-Fi, and even the current Wi-Fi 6 and anti-jamming Wi-Fi 6E (using the new band, which is not affected by 2.4g or 5G) are likely to widen the gap between Wi-Fi and 5G. What we can do is wait and see how 5G optimizes in the live network and how millimeter waves are applied. Millimeter wave is not widely used in China. We use sub-6G frequency band. Millimeter-wave coverage is limited, but there are many available bandwidth resources, so it can achieve a large bandwidth, and in the future, it will be used as a reinforcing application in some hot technologies in China. Once millimeter wave hits the market, I think it’s possible to close the gap between 5G and Wi-Fi with low latency, but it’s just my opinion. The question then arises: How do we locate 5G if it can’t compete with Wi-Fi in latency? 5G has advantages over Wi-Fi in terms of mobility and wide area coverage. We can’t carry Wi-Fi around on our backs, especially outdoors. Therefore, we need to judge for ourselves which businesses and scenarios are suitable for using 5G. The advantages of 5G in wide-area coverage are worth paying attention to, and we need to consider whether it is compatible with our business. Personally, I think that services such as AR, AR glasses and outdoor live broadcasting have a better affinity with 5G. By the way, I wouldn’t pay for something like 5G+VR, because VR has nothing to do with 5G. VR is more of an indoor application. Isn’t indoor Wi-Fi better than 5G? At the same time, 5G also has the problem of traffic charges. But AR is different, because I have to go, and must be combined with the scene, the same with outdoor broadcast.
10 milliseconds of interaction
We also think that with the progress of 5G technology, the era of 10-millisecond interaction is coming soon. We now have another type of 5G called F5G, which is the so-called fifth generation fixed network, which is fiber-to-room and Wi-Fi 6. The latency must be very low, and the coverage is strong, high bandwidth. Going back to our 5G, in theory we can do a good job with a 10 or 20 millisecond RTT. Even in the wild, without Wi-Fi and 5G coverage, we can achieve 20-30 ms RTT even with low orbit satellites (300km and 500km), which means the whole world can be surrounded by relatively low latency networks. If the whole chain can be built, we can achieve an end-to-end delay of 100 milliseconds or less. If we’ve been thinking more about live in seconds, in hundreds of milliseconds, now we need to think more broadly about 10 milliseconds. So in the future, Wi-Fi will mainly be used indoors, while 5G May be mainly used outdoors, shopping malls and so on. As for the wild, we have always wanted to go to Tibet to broadcast live, and with low-orbit satellites, this could become an option.
Huya’s practice is mainly driven in two directions: real-time content operation and live interaction. We think the delay will go down and the interaction will be a different experience. So what we’re doing is we’re doing cloud games that you’re familiar with, and multi-player interaction, and even multi-player interaction combined with live streaming, and virtual and real interactive live streaming that we’ve been exploring. But at the technical level, I need a good end-to-end network to ensure ultra-low-latency interactions. In addition to Wi-Fi (which still has plenty of room for optimization), operators will also open up 5G QoS. We will also rely on 5G QoS plus multiple access (dual link, Wi-Fi 5G access at the same time) and public network multipath plus something like SDWAN to build a LDDN network (low latency deterministic network).
5G low delay other landing direction
Forget the live broadcast, forget the teeth, let’s take a look at 5G low latency is there any other scene landing direction. To B, I am very optimistic about the application of road collaboration, and edge intelligence; For example, the video application, upload the video to the edge, do some structured or extracted processing after the edge, the video will not be sent to the center of the cloud thousands of miles away. This is a good offloading of resources for both computing and networks. There are remote control and collaboration, industrial AR applications, for example, where we can sit at home and instruct a worker in Europe to fix a car. And then, of course, there’s autonomous driving. One thing to grasp on to C is the distinction between 5G and Wi-Fi. What Wi-Fi does, no one wants to use 5G. More than 90% of domestic households have broadband, of which 94% are fiber optic, so there should be very little need for 5G to do some broadband access indoors in China. We need to keep an eye on which apps are compatible with 5G. Maybe in the future, when autonomous driving enters the market and there is no driver, then sitting in the car may need some entertainment, the car can only use wireless connection, so it can only use 5G.
However, there are still many problems with the low latency of 5G. The whole country’s 5G network is under light load, and the current latency is still only so-so. It remains to be seen how long the latency will be maintained after the heavy load. Millimeter wave has its own advantages, but when will it be on line in China? In addition, we pay more attention to the low end-to-end latency in the business, which depends on the maturity of the entire ecological chain. As I have mentioned before, if there is always a delay of 2-3 frames for camera acquisition, it is difficult to break through the end-to-end latency bottleneck. There is also the operating system itself, the camera delay is also inseparable from the operating system itself, such as android phone for the camera Settings, the depth of the entire pipeline determines the level of delay. As another example, Android 11 Media Codec2.0 has made some enhancements to low-latency decoding, and there will be many more. In addition, are we prepared for low latency when we write our applications? I talk to a lot of developers who are using Java layer apis for audio playback. If you don’t use openSL ES or AAudio, Java layer audio playback latency can reach 200 milliseconds on the phone. All the previous efforts at the network level are in vain. For technical challenges, the RTC mentioned earlier. For Huya, we want to keep up with wireless access QoS, and we may rely on the carriers to open up QoS to us; At the same time, because the optimization of air port is far from enough for the optimization of the whole link, we will also adopt multi-access (Wi-Fi 5G or Wi-Fi 4G dual link), including multi-access routes on the public network, and even multi-path low-correlation routes to ensure reliable transmission. At this time, I’m not building a carrier-class deterministic network, we just need a simple version, enough to support my ultra-low latency is enough. At the level of market operation, operators are constantly promoting 5G, but 5G has no price advantage compared with Wi-Fi, and there is also the problem of insufficient data in the package. In addition, in the future, there will definitely be products with delay differentiation (low delay and ultra-low delay and general delay), which involves the problem of how to promote QoS. Therefore, the strategy of operators has a profound impact on the development of 5G.
conclusion
A quick summary: There is a considerable gap between the theoretical and engineering data of 5G latency, and we need to face this issue. What we need more is not only the low delay of the empty port, but the low delay of the end to end; 5G isn’t everything. It’s more about expanding your business in a spatial dimension that you couldn’t have done before. Technically, we need to build our own cumulative low-latency deterministic network for our low-latency 10-millisecond era. That’s all for today’s share. Thank you!