On the afternoon of August 18, in the thirtieth phase of Qiniuyun Architect Practice Day, Xu Jing, the product r&d director of Qiniuyun education industry, shared the actual battle with the title of “Live Classroom Practice based on WebRTC architecture”.
This article is a record of the content of the speech.
About the author:
Seven Niuyun education industry product research and development director, with 12 years of experience in the Internet industry. Good at combining traditional industries with the Internet to form Internet products with Plus effect. I used to be in charge of the whole Youku live streaming business in Alibaba Entertainment, and now I am the product R&D director of Qiniu Cloud education industry, constructing the cloud computing model of Internet education, creating cloud computing solutions for online education industry, and breaking through the traditional education barriers.
Evolution of streaming media protocols and functions
Analysis by the International Telecommunication Union explains the functionality and evolution of streaming media protocols. When it comes to education and livestreaming, to put it simply, livestreaming is no longer just watching. Before, one of the most important functions of livestreaming was to let people watch, but now the most important function of livestreaming is interaction, which forms the interaction between people.

Live streaming should be within 200 milliseconds. By 400 milliseconds, a dividing line has been formed, and the audience experience has become so large that I don’t want to listen to it anymore. This is also a fundamental part of the future education industry, which is to shorten the distance between teachers and students and ease the communication barriers between them.
Do you have an experience? A few days ago, there was a typhoon in Shanghai, and the TV language in many scenes said, “Let’s have a look at the reporters at the landing point of the typhoon.” At that time, you found that many reporters were wearing earplugs and only said after two seconds, “Come, let’s have a look at the background behind us.” This experience was very bad. If one day our live broadcast technology can achieve a delay of less than 150 milliseconds, such scenes will no longer happen, and the experience will be better and better. You have to have 150 milliseconds of technology before you can move slowly to education, to health care, where there’s a lot of interaction.
Therefore, the conclusion is that the demand for real-time is getting higher and higher, and one-way transmission of more than 150 milliseconds has begun to affect the user experience. There’s a very important concept called user experience, and the best example of that is Apple. Why do people use iphones so much? The hardware isn’t better than Android, but the user experience is good enough. I asked a lot of people earlier, why iPhone? A lot of girls say that using an iPhone just doesn’t cut it. Because of the lack of delay, iPhone has made great efforts in user experience. There is a team dedicated to user experience in apple’s R&D center.
Here are a few concepts to popularize:
First of all, IMS is a very, very old Microsoft streaming protocol, which is the granddaddy of streaming. At the beginning of Internet streaming media, everyone is using IMS protocol, through Microsoft Windows architecture to establish. Later, RTMP is the most common protocol in the streaming media industry, but it has a problem. It uses THE TCP protocol, which brings high latency.
Next is WebRTC. From last year to this year, more and more institutions want to be real-time interactive streaming media institutions, so they slowly start to follow the road of WebRTC. It’s a very low latency protocol that allows me to be within 100 milliseconds of you. Looking at the evolution of streaming media, from on-demand cloud to live cloud to finally real-time cloud, we think that in 150 milliseconds, we can call it real-time cloud.
The one we use the most now is RTMP, which is based on TCP. TCP itself is a secure protocol, but requires a lot of work, resulting in not that efficient transmission:
First, there are three handshakes. The communication between point A and point B requires A handshake.
Second, you lose your bag. When I transferred this data to point B, I found network jitter and packet loss, so IT needs to be retransmitted.
Third, flow control. As a result, the entire TCP protocol cannot be fully used as a real-time cloud concept.
Comparatively speaking, UDP protocol is very efficient and fast, but it also has many problems:
First, security. As we all know, UDP is an insecure behavior that does not automatically retransmit. It is totally unacceptable for video streaming media. My picture was sent to point B and the packet was lost due to network reasons.
Second, UDP is very harsh on network conditions. Friends who have done the Internet know that TS stream will only be used when the network environment is very good. Since TCP has such problems, UDP also has many problems, how to solve real-time communication, while ensuring quality, cost, delay. Many people, including scientists at Google and senior solutions experts at Alibaba, are studying this triangle.
UDP brings the lowest latency, but does not guarantee quality. TCP has the best quality, but does the worst latency. It introduces a concept called Reliable UDP that makes UPD a secure and Reliable transmission and scenario-based retransmission. My data from point A to point B may have some problems due to network jitter and did not pass through. For UDP, standard UDP does not repeat. In this scenario, it will analyze whether the data is the most important to deliver. If it is found to be very important, it can continue. If non-important data, such as part of the information chain, is lost, it will not be retransmitted, which will not cause the loss of major services. This is called scenarioalized retransmission strategy.
Third, bandwidth adaptive adjustment. UDP has high bandwidth requirements, and if bandwidth adaptation is added to it, it is a very reliable transmission mode. There’s still a lot of room for improvement, and humans are constantly updating and iterating algorithms to improve them more efficiently. The WeBRTC-based scenario discussed today is one.
Live class application based on WebRTC
The following are some applications of WebRTC in the field of education and educational scenes. What exactly does a live class do? How does WebRTC build relationships between teachers and students? Chinese education in particular has been ridiculed by people both at home and abroad, but to be honest, it also contains a lot of opportunities.
The overall market environment of China’s education industry is 180 billion YUAN in 2017, but the total annual income of the top ten enterprises is 4.1 billion yuan. That means the education market is huge, and people are building relationships between students and teachers, using the Internet to open up this market.
In order to achieve education, it is necessary to solve the online environment with low delay. In traditional live broadcast, the experience from the teacher to the students listening and then to the students talking is very bad after two minutes. Interactive display in the process of live broadcasting, a problem is put forward here. The whiteboard is not enough for the communication between teachers and students only through voice and language. If you can’t explain clearly in words, draw a picture on an online whiteboard and you can figure it out in two minutes. In the context of education, it is very, very important for whiteboards to deal with interactive presentation problems in the educational process. Entering the teaching process, many people hope that they can digest it more and taste it slowly in the future. This is the process and way of recording teaching.
Find effective ways to study online. A lot of kids think it’s my parents who make me learn, and I don’t want to learn. If this state of learning is put in, it becomes very inefficient. How to turn them into a productive learner? Establish an environment that supervises the whole course of teaching. For example, the child’s parents spent a lot of money to let him learn the piano online, but suddenly found that the child was not willing to learn. At this time, if someone accompany him throughout the whole process to track his learning results, it would also be a way to improve the quality of teaching.
Finally, platform data accumulation. Review is very important. In fact, there will be a lot of data accumulation in the education industry. The analysis and improvement of data can better help you prepare for your business in the future.

The main kernel of WebRTC is the Google architecture. When we look at education, we do four things:
First, picture transmission. Many fields are called picture transmission, and the camera in the back has real-time picture transmission of microwaves without wires. The picture transmission here is the transmission of real-time network signals between teachers and students, which includes two points: the first is low delay, the second is high precision. Online education is hot in China right now, especially in the capital market, but some of the leading companies have started to research into high precision. For example, Da Da has some basic users, now very want to do one thing, is how to improve the quality of video. Most of today’s online teaching is done with cameras, and that’s not enough to make you look Professional. He thinks this is an engineering probe, rather than something with quality and brand, whether I can have a special camera, whether there is a 1080P standard, and even da da is studying whether there is a 4K standard can be added, which are all high-precision applications.
Second, interaction. This interaction is not the voice interaction between people, but the whiteboard interaction. There is a core problem. For example, 10 people need to see the changes on a whiteboard at the same time, just like drawing a picture on the blackboard. Maybe only people present can see it, but how to make more people in Beijing, New York and other places see it and synchronize many points is very difficult. As for handouts, there are not many people who do very well in the market. When I talk about whiteboard drawing, if I talk about PPT and the PPT on the scene cannot be seen, at this time, PDF and PPT can be converted into whiteboard so that various users can see it, which is called interaction.
Third, service. I used to start my own business before, and I have been reminding myself of one thing. In today’s society, you can’t earn much money by doing business. Running a shop is always about selling things. The answer is services. Provide online platform for you to use, call service; Provide taobao platform for you to use, this is called service, it can make money. This service for education is the supervision ability of education and classroom analysis, how to let a student can learn, how to let the parents can supervise, these are the problems we think about.
Fourth, precipitation. Big data precipitation. Dada thousands of classes online every day running concurrently, thousands of classes a day, how much data is deposited? Will this data help you improve your user experience in the future? That’s the beauty of data. There are a lot of companies that are buying this data, and this data is only available through a lot of accumulation. The same goes for the method of follow-up learning.
WebRTC based live class · low delay picture transmission
· Low delay bidirectional efficient connection
The way to build a reliable WebRTC, which stands for low latency, low latency to build bi-directional efficient connections. In order to experience online education, I personally experienced Dada and VIPKID, and experienced a dozen of them. I found that there is a problem. There is a consultant whose function is called student. Do it. Half an hour’s gone. This is an inefficient connection. I’m going to show you how to make connections with low latency and high efficiency. I hope online education doesn’t have such unscientific things as making connections 30 minutes in advance.
· The protocol determines the low delay Level
I’ve asked experts at the bottom of streaming media what, from their perspective, is the most reliable way to reduce latency. His answer: Protocols are everything, they determine the big direction, UDP doesn’t reach the millisecond level, and WebRTC doesn’t exceed much at worst. In fact, after the protocol is determined, your large delay tone millisecond or second level has been determined, and the next step of optimization is the physical network.
· Optimization and improvement of physical network
Beijing to Shanghai, if you pay a high price for a very good node, 10 milliseconds. If I’m on a normal network, maybe 50 milliseconds. Some people say every day that there is a telecommunication expert in Beijing, and there is a doctor Peng, who has problems all the time, because its physical network is not particularly good. In addition to the protocol, there are also physical solutions, which are not as effective as the protocol itself.
· Reduce network behavior time
Now there are A lot of videos before transmission will do some behavior, such as encryption, such as echo cancellation, I send the sound has some squeal, reduce squeal, improve A and B point low delay transmission.
Live class based on WebRTC · High definition streaming media
To be a very professional educational industry or institution you have to lean towards high definition. High definition is affected by several factors:
The first is sampling the source. Sampling is also divided into many levels. Today’s camera is SONY 4K, which can collect 4K precision at the highest. First, core resolution. Whether the resolution of my sample is 4K standard or STANDARD DEFINITION standard is a very important standard for my accuracy. Second, the color is deep. This is the standard in the field of radio and television, color depth can be considered as its index is slightly higher, the image quality is slightly close to lossless.
The second is coding. Coding is divided into coding efficiency, Encode Profile, some people turn a video on PPT, you will find why the same computer, I turn out very slow, others turn out very fast, those Youku, tudou video website minutes can turn out. In fact, the water in this transcoding is very deep. Even if you transfer the same bit rate, the same resolution and the same color depth, the encoding efficiency will be generated. The so-called Profile will not be the same, and the time and image quality will be completely different. To put it simply, we know that JPG is an image file format, why JPG on the Internet some tens of KB, some take a photo 10 KB, is a compression.
Then there is encoding, HL4 is also a encoding, video uses AV encoding, audio may be AAC encoding. Different encoding methods bring different bandwidth saving with the same image quality. H265 and AG1 are two encoding methods that people are very interested in recently. The file size and bit rate produced are 40% to 50% lower than HL4, which is a cost for enterprises. This is called encoding method, and it is also very important for the high quality of the whole image.
Bit rate should also be combined with resolution, color depth to determine. Many online 1080P video, the bit rate is only 3 megabytes, there is no clear 720P, although the accuracy is very high, but the cost is very low, the image quality is in a mess. This is why the picture is not clear when the resolution is very high. Which picture is clearer with 1080P at 3Mbps or 720P at 1.8Mbps? The answer is 720P. The recommended value is 1.8m for 720P and 5.5m for 1080P. 1.8 Although the bit rate is low, but the resolution is also low, only 720P, the resulting image quality is actually 720P to 1080P screen is obviously better than 3M bit rate, this is also the online education will use low bit rate, but relatively low resolution solution to solve the problem of picture clarity.
GOP is the most important value for the slice. GOP represents how much space is really optimized by algorithm between two key axes. Now many hardware manufacturers in the codec, in the GOP can be programmed as operational, so that you can freely define the GOP value, to optimize the picture quality.
Encode Profile, how high is the transcoding efficiency of different video files, and what are the quality differences of different profiles? The difference between CBR and VBR is that CBR is fixed bit rate and VBR is dynamic bit rate. I can dynamically adjust my bit rate space according to the picture of the image to achieve a better standard of the image. I’m standing here today, and if the camera is pointing at me and I speak, the whole picture is still, and the bit rate is very small, and when I broadcast a moving scene, a person is running, the picture is always moving, and in real time it becomes a high bit rate method, which is called VBR.
So which scenarios use CBR and which scenarios use VBR? CBR is suitable for network bandwidth is fixed, I may only have 2 megabytes at home, I set 1.8, but VBR is very unsuitable, suddenly become 2.2, more than my home 2 megabytes of bandwidth, all of a sudden my video has a bit of lag, network conditions allow, VBR bandwidth consumption is far lower than CBR. Real-time analysis of image quality for the whole bit rate.
WebRTC based live class · Education whiteboard
The whiteboard we’re working on is directly online. The classroom and students or more people 1 to 4, 1 to 6, 1 to 8, this dynamic whiteboard, on which I can draw with different pens and mark with different colors, and also the basic graphics below, I can circle some of the key points I want to talk about. Powerpoint and PDF, which are very popular. When I have some PPT and SOME PDF to share, I can make it part of the whiteboard, and let everyone do the lecture, so that the whole teaching becomes more efficient. I’ve always thought that whiteboards are more useful in the educational context than video. What the video does is let me see the teacher, see what he looks like, have a face to face feeling. But whiteboards are really at the heart of education, which is why every school has a blackboard, and whiteboards are very important in the context of education.
Service, confluence service. Now a lot of space, a lot of Internet organizations are very concerned about three modes: first, Mesh mode. Second, MCU mode, all the mode to the central server. Third, SFU mode, which is used by most online education institutions in China. They use bandwidth up and down structure and flexibility.
The role of server confluence:
First, let the parents can see my child in class, how the teacher, how the students, whether they have a very big match. Second, let my online education platform, my academic administration staff to evaluate the quality of teaching. Third, let more children in mountainous areas watch the live broadcast. In the past, I could not hear the lectures of fudan University teachers in Shanghai. In mountainous areas, I could pay attention to them in this way. Fourth, it can be recorded, and the recording cost is greatly reduced. Fifth, analyze data sources.
Application of intelligent AI in educational scenarios
· AI effectively assisted teaching
Once Da Da English talked with me about an issue. They did K12 online training for children and found a problem. A child told the teacher that my father asked me to attend the class, but I didn’t want to attend. You take the money, this class is OK, I’m going to play video games, you can do whatever you want. This scenario turns out to be an ineffective classroom, with no one at either end of the camera. In order to prevent this situation, AI functions are highlighted. As long as you take an AI to identify the scope of the two probes, if there is no person there for more than a minute, immediately call the police and report to the educational administration platform, which is an invalid teaching, and this is the auxiliary role of AI.
· AI intelligent class check-in and tracking
This is suitable for the big class, today there are 100 people in the class, everyone to the front desk do not need to sign in, direct face recognition can be taught. On the other hand, today’s college students hate this technology. They have skipped classes in college. After the teacher had been talking for ten minutes, some students slunk off from behind, as is often the case in universities. Now one camera will look at the whole classroom forever, there are 100 students in the classroom, if a student slips out of the classroom, it will automatically alarm the police, you want to skip class is impossible, this is the AI function. AI can recognize that this is Joe skipping school.
· AI teaching interaction rate monitoring
The teacher and the student in communication, suddenly found that the student did not listen, his eyes are confused. If the AI looked at his face, it would say that this guy hasn’t made any gestures, hasn’t raised his hand, hasn’t had any interaction, and it would be written down as something wrong.
· Integration of AI and big data
When this person is found to have problems, the teaching affairs will be informed that the class is invalid, and the teaching affairs will be asked to distinguish how many times this person has effectively taught, how many times he has interacted with the teacher, and how many times he has raised his hand.
· Reduced the problem of re-operation of online education
For example, as da expands, the more students there are, the more consultants there are, 10,000 consultants for every 10,000 students, and when the number of students increases to 100,000, the number of consultants increases to 100,000, the cost is too high. This requires AI to solve. Help you advance your pre-school preparation through AI.
Online education big data center

Precipitation data. Every time we finished the class, including traditional AI data and data into the educational administration of large data centers, the large data center can easily go to recommend student interest in learning to do, for example, when I find some students in the class of math class active degree is high, hands there every day, and Chinese classes every day to sleep. At this time, big data will immediately send out an alarm and recommend the student to study science in the future, which is the function of big data and can recommend interest in learning.
Still can judge student attitude, see this student is serious after all. The quality of teaching, whether the teacher speaks well or not, can also be reflected by the interaction rate of students with big data. Timely matching, student/teacher interaction. Real-time data back up, someone left, alert the teacher.
Seven Niuyun live class practice based on WebRTC architecture

The 1-on-1 part of the education solution architecture that seven Niuyun is about to release. Teachers and students access through different SDK devices. What they need to do is to solve the problems of HIGH definition, low latency, interaction and RTC through WebRTC with high efficiency and high performance, as well as the whiteboard connection, which is the most important bridge between teachers and students. In addition to these, the AI part, the artificial recognition part and the data storage part, the bypass live broadcasting part are all integrated into it, which is a complete education system solution. This is a 1 to 1 solution, and 1 to many.
What scenarios are the webcast classes based on WebRTC architecture applicable to?
First, 1-on-1 online; Second, small classes of 1 to 4; Third, the traditional education information transformation, colleges and universities should also be online transformation; Fourthly, double-teacher education is very important in K12 field. Fifth, enterprise internal training.
Q&A Will WebRTC bring the 3.0 era of live broadcasting?

Q WebRTC’s audio quality problems gleason
Q Is only education appropriate for the RTC?
A: Not really. Now we can see that medical, education, live broadcasting, hardware communication, as long as there is a low delay scene demand will have RTC, education will also be divided into many categories, enterprise education, traditional education and even medical education.
Learn more about seven NiuYun real-time audio and video cloud: www.qiniu.com/products/rt…
Click “Read the original article”
Learn more about Qiniu Cloud