Volcano Engine general manager Tan

On June 10, Volcano Engine held a brand launch conference. In his speech at the press conference, Huoengine general Manager Tan Taiti said byteDance’s best technology is open to the public, and huoengine’s video cloud product, which has served hundreds of millions of users, is one of them after being honed by products such as Douyin and watermelon. What c-side advantages can volcano Engine video cloud extend from C to B? How to use these advantages to provide customers with better service and experience? In the face of fierce competition, how does it break through the siege? What new audio and video scenes are worth looking forward to in the future?

LiveVideoStack recently sat down with ByteDance to talk to Keith, product lead of Volcano Engine Video Cloud, about the volcano Engine video cloud and his thoughts on the future trends of audio and video technology.

Development and application of audio and video technology

Audio and video technology has developed rapidly in recent years and has been applied to all walks of life to eliminate the distance between people. Especially during the epidemic last year, audio and video technology brought great convenience to our lives, such as online classes, video conferences, live shopping and video chat. The application of these audio and video technologies enables people who could not communicate face-to-face with each other during the epidemic to get together again.

Audio and video technology is now and will continue to penetrate into all industries for a long time to come. Keith said. “I feel very lucky, to stand on the track, and my colleagues, the volcanic engine video cloud products do more professional, more pervasive, and be able to in the process of the whole social information of continuous video, for customers from all walks of life and those who need the digital transformation of enterprises to provide services, they can more easily to application of audio and video technology, And grow your business quickly on top of that.”

Keith says the future of audio and video is more interactive and real-time. He believes that video conferencing, online classes, e-commerce live shopping will still be the large-scale application of audio and video technology hot areas. For example, online shopping is no longer limited to the product display of the past, and buyers can get a better shopping experience through real-time video communication with sellers. Online classes allow students in remote areas to access excellent educational resources, eliminating the inequality caused by distance.

“Virtually every explosion in audio and video technology has been about application scenarios. The scene is like a catalyst that keeps stimulating the development of audio and video.”

Volcano Engine Video Cloud has a built-in video playback experience

At present, Volcano Engine has launched A four-tier architecture of unified basic services, technology center, intelligent application and industry solution, including A/B testing, intelligent recommendation, veCompass, flying connection, growth analysis, video on demand, cloud editing… These products are productivities of ByteDance’s best practices.

Video Cloud is volcano Engine’s medium level service, and Keith thinks the biggest advantage of Medium is the inherent video playback experience. This is the result of Byte’s continuous polishing of Tiktok playback technology and continuous iteration in user experience.

Keith gave a specific example of how they experimented with Tiktok over 100 times over a period of eight months to optimize their player and decoding capabilities. This is basically impossible for other cloud vendors, but in the actual practice of such a large scene as Douyin, these are the details and experience problems they need to solve, and these problems are inevitable for all audio and video apps when they develop to a certain scale.

“We will continue to explore the extreme of the video playback experience in large scenes like Douyin and watermelon, and solve large-scale problems in the process. And the solutions to these problems, we will precipitate it into a methodology, and then integrated into the volcano Engine video cloud products, and then to the market. Our customers will be able to draw on off-the-shelf, real-world solutions to their problems.”

Keith says that other cloud vendors may not invest as many people in this kind of r&d, operations, and re-optimization of the data as they gather it, or have A large business scenario to do A/B testing, so it’s hard to create A play experience like volcano Engine’s video cloud.

Take the mobile terminal as an example. Douyin has many new requirements on the multimedia SDK, such as video preloading and prerendering, so it continuously optimised the product through A/B testing. When byte’s technical students polished the experience to the best (pioneered the “zero first frame” technology), they found that although the whole video cloud was a red sea, it was still a blank area in the video cloud and no one laid out the layout, because other video cloud manufacturers could not understand the new demand at that time.

Atomization of product design

Of course, the process of cloud production of Volcano Engine video will not be smooth, and there will be challenges.

Keith explains that this is mainly because they are dealing with customers at different stages of development and at different levels, so they start thinking about how to design a product that works for all customers.

“It’s really a test of how the product architect designs the entire product. To solve this problem, our architects cut the product into the smallest pieces and then combine them into a solution within the same API and SDK architecture. “Because different customers will use different parts of it, it’s important to keep the product particle independent and coupled, and to connect all the functions in a workflow, so that different customers can meet their needs in one API.”

At present, Volcano Engine Video Cloud is gradually launching an SDK packaged according to customer needs. This atomized abstraction will be gradually opened to customers during the whole process of product introduction to the market. You’ll see hundreds of these product features later, but they’re all available in one set of apis.

“But that doesn’t mean we’re going to customise, our features are standard, customers are just taking features that are appropriate for their business. “I want our products to be flexible enough to meet customers’ needs, to be more flexible and to reduce their migration costs and usage costs.” Keith emphasis.

He explained that SDK toB is not easy to do from the perspective of the whole industry at present. There are two main reasons: first, the cost of late service is too high, the amount of feedback from customers is very large every day, if the workload is too large, the whole team will become very busy and cannot do their own normal iteration; Second, the willingness of customers to pay is very low.

Why do you do it? Keith believes that there are two points: first, a large number of products are served internally and the business complexity is very large. We have made enough business adaptations to form a set of general layer capabilities on this basis, whether toB or not. The second is the culture of excellence across the entire byte, which allows us to go really deep in technology.

From C to B

Volcano Engine video Cloud team’s position within Bytedance is also the technology center, will be concerned with a lot of C-side business issues. Therefore, when supporting c-side business requirements, they will take the initiative to think about how to horizontally copy, promote and precipitate, especially pay attention to the technologies and methods to solve specific business problems, such as cost saving and experience optimization.

In fact, whether tiktok, Toutiao or watermelon video, there are many challenges in scale and innovation. Keith says there is valuable technical experience to be gained in the process of solving these challenges, which is first validated, precipited and then made available to customers.

LiveVideoStack learned that the B side of ** Volcano Engine video cloud and the C side like Douyin are completely through, no matter the ability, or the technical team is basically the same, which also makes them do B side services, will be more directly from the C side to consider the problem. ** This may not be the same as many cloud vendors, as their B and C sides are supported by different teams.

Another advantage of having the same team working on both TO C and to B, Keith points out, is that they have a lot of gameplay ideas that can help clients innovate in their business.

“Because the C end is more concerned about getting users in and keeping them, and that involves a lot of live, link-and-pk gameplay. When we do b-side services, we naturally form a complete solution and then export it to our customers.”

The technology will be available in the next year or two

In the interview, LiveVideoStack also spoke with Keith about the future of audio and video technology.

In his opinion, in terms of audio and video experience, it will move forward one layer in one to two years. For example, livestreaming technology will improve the efficiency and synchronization of information transmission in e-commerce and education, including supporting scenes such as large-scale classes and large-scale shopping. In addition, volcano Engine has been exploring the H.266 ultra-light compression technology, which can also make people feel a better experience. In terms of technology, RTC technology will become a standard configuration of the entire Internet App, and will also become the basis and mode of the next generation of Internet communication standards.

If the technical level is further dug, how to improve the picture quality on the whole link may be a trend. The Volcano Engine Video Cloud has begun small-scale trials of the H.266, which is expected to be available within two years.

In addition, Volcano Engine continues to optimize its full-link image quality assessment techniques, weakening in link A and enhancing in link B, so that the combination becomes richer. Keith gives a specific example: “On The production side of Tiktok, we couldn’t give it too much bit rate because of the contribution rate. As a result, the hair details of the video might get blurred, which can be recovered later during transcoding and mobile playback. Where to recover from, how to recover more reasonable, these are the areas where the technology can be deeply dug, through a series of combination, finally presented to the consumer end is still clear picture quality.”

New scene of the future

Keith also saw something new in his meetings with LABS and foreign companies. For example, 3D portrait projection, the projection of a long distance to another space, and then face to face with you chat; There is also 3D environment modeling, which is to virtualize a real environment and then project it in front of a person so that he can look into a more realistic environment. Keith believes the idea is to create a more immersive world along the lines of video, bringing new ways of interaction that reduce distance and cost between people.

While these technologies are still in the research environment, Keith says they are looking for the right scenarios and fertile ground for them to mature and then open them up to various industries.

If you want to know more about volcano engine video cloud technology, please pay attention to the volcano engine special session on LiveVideoStackCon2021 in Beijing on September 3rd, and have a face-to-face chat with senior technical developers of volcano engine about audio and video technology.

Guest: Keith, head of cloud products for Volcano Engine Video

Interviewer: Chief editor of LiveVideoStack bao Research

Editor: Alex


The audio and video technology behind ByteDance is revealed

In recent years, audio and video technology has developed by leaps and bounds. On the one hand, it meets the demand of enterprises for rapid business growth, and on the other hand, it creates more possibilities for business development. In this special feature, we will show the audio and video technologies behind ByteDance and how they can be leveraged to support business growth and meet the needs of our partners. This share will start from audio and video codec, review audio and video codec technology and outlook, introduce the optimization and evaluation of video codec; Then, the application of audio and video in live broadcasting and how to support business growth through audio and video will be introduced. Finally, we will take Douyin as an example to introduce how RTC technology pursues the ultimate experience.

Please refer to the scan figure for detailsQr codeOr clickRead the originalSign up for special events.

This article uses the article synchronization assistant to synchronize