Today I would like to share with you the InfoQ platform open course – Extreme real-time video communication under weak Network. The speaker is Professor Ma Zhan from Nanjing University about the extreme exploration of real-time video communication.
I. Background of the project
First of all, the background of the topic, usually mobile phones, computers and other network equipment to receive information accuracy and timeliness are related to real-time communication, take real-time video communication as an example, we can not always ensure the real-time stability of the network, at this time, the existence of weak network environment will play an important role in improving the quality of transmission.
Citing the official explanation is: weak network environment exists for a long time, especially in many critical moments related to life, production and even life, communication network is often subject to great physical conditions, such as maritime operations, emergency relief, high concurrency scenarios, etc. Therefore, we need to explore new theories and new methods to effectively analyze, accurately model, accurately predict, in order to achieve the weak network limit environment (such as extremely low bandwidth <50kbps, unstable network jitter, great delay, etc.) of high quality real-time video communication.
Professor ma himself about the direction of video processing are introduced at first studied about seventeen years or so, at present mainly in two aspects, on the one hand is about the information acquisition, on the other hand is to use such as face recognition, traffic identification, intelligent transportation technology of video processing, facing such a reconstruction.
What is the ultimate video communication under weak network?
Weak network
Weak networks are not the same as the regular Internet, which is already pretty good from the point of view of its current limits. For example, no matter whether it is live broadcast or on-demand, no matter from the point of view of signal processing, video compression or from the point of view of the network, network equipment has been able to meet the HIGH definition, ultra high definition, or even more. However, the base station could not be used in the event of large-scale mudslides. If it’s maritime, you can only use communications satellites. However, we need to grasp the online environment in real time, timely and accurately. At this time, it is very important to study a kind of extreme video framework, that is, weak network.
3. Architecture design and advantages of extreme communication
Three aspects:
First, from the most basic point of view of some such engineering design, can really all move towards data movement.
Using the old data-driven approach, similar to Alphago, which uses reinforcement learning. Using reinforcement learning to control network bandwidth, to control some of our complex parameters like video codecs. By contrast, these parameters and edit code parameters for these networks are numeric. Therefore, if we design it empirically, there may always be a bottleneck in our mind.
The second is experiential design, from data-driven to intelligent.
Professor Ma has a title here, from Alpha Go to Alpha Zero. When it comes to the design of AlphaGo, he will make a simple start for many of them, but when it comes to AlphaZero, he will start from the initial state according to his own pattern and then slowly learn. Therefore, it is also proposed that for end-to-end video communication, online learning can be used to learn different states of the whole network interconnection. Then provide an up-to-date online learning model or decision to achieve personalized learning for a single user.
Third, use the form of video center and data communication. Combined with video content or image content, the communication information itself in the user’s understanding of this, or we call the semantic level of such a content understanding, really from data to artificial intelligence. In perception, even if a frame is lost in the video or some pixels are lost in the image, or even some chunks are lost, it can be retrieved through some compensation methods.
Iv. Intelligent video coding
In terms of video signal processing, how do we do video compression and video coding with a neural network like this that has brain visual inspiration or a signal processing with a lower bit rate like this?
Video compression is actually a process very similar to the pipelined structure before. From the pixel to the coding side, from the pixel to this placement stream, decoding from the binary stream to the pixel, it’s actually an information process. So this information process we have some new theories and new methods should be explored, should continue to explore.
There are two systems mentioned, and from the human point of view, we go from the retina and then to this one in the middle. Called optical nerve. And then to this lateral genicular double layer like this, and finally to our brain, which we call the primary visual cortex. So this is also the gradual extraction and perceptual understanding of information.
From another perspective, it is proposed to use this biological vision or old vision to inspire, and use the most basic information flow to carry out network imaging from the 3D world of human eye perception. So this is called for the pass way to the middle is the outer side absorbs the bottom layer, and then goes through the different cells to our primary cortex, to this aerial inside, and then every part of it has a lot of these functions. Now, in addition to theoretical exploration, we call this stimulation experiment, there are a lot of anatomical experiments in primates. So it also proves from the side how such information is a transmission process.
Technical challenges – complexity
For some of the previous video image processing, one of the concerns is its complexity. Its complexity is also a very important part of the implementation of chip design.
The solution
A new approach is proposed, which is whether we can combine this mode based on such a brain vision with the current traditional such a video compression. There are two main reasons for this, usually from performance. In terms of performance, though, our current image compression has exceeded the latest international standards. But there’s still some way to go when it comes to video chat, and there are billions of devices out there right now. Such a large number already exists. Therefore, the most effective way is whether we can make some old data inspired by some simple transformation on the existing such equipment, and can actually use it in video processing.
5. Network adaptive transmission
Video bitrate adaptive based on reinforcement learning
Problem description and difficulties
Network delay jitter can cause real-time changes in available bandwidth. The existing algorithms are mainly designed for VoD field optimization/heuristic design. In real-time scenarios, future video information cannot be obtained and large buffering is not tolerated
solution
1. Design an efficient and robust bitrate adaptive algorithm to predict bandwidth and dynamically adjust video encoding and transmitting bitrate
2. Real-time bitrate adaptive strategy system framework, automatic learning real-time bitrate adaptive algorithm through historical video streaming experience
Later, according to the advanced experience of learning internationalization, this is used in the real real-time system. This real-time system was then used to perform a distributed learning on an Any game on the Internet. Therefore, we propose a adaptive time streaming that is offline. Collected a lot of such a network vertical, also including like Europe, like other laboratories give out, and then proposed a network feedback signal standard, which carried out an evolution.
Video bitrate adaptive evolution based on reinforcement learning
There is a problem
1. Offline training samples are limited
2. The simulated loop filling may be inconsistent with the actual environment
3. Consider the performance loss caused by model generalization performance
solution
1. Clustering and classification of network conditions
2. Video content service and classification
3. Train offline models according to network conditions and videos
4. Online model tuning further covers unconsidered environmental conditions
6. End-to-end extreme video communication demonstration platform
I did two demos. The first one I did with Any Game was the whole state of the game, the network awareness and the cloud game.
DEMO – cloud game
The other is to use such a thing called line in the cloud, and actually also through the form of video to be able to pass such a desktop back.
DEMO- Remote desktop
The above is all the content of this note sharing. You can click “Here” to watch the corresponding video content of this sharing.