Tan Yu, general manager of Volcano Engine

On June 10, Volcano Engine held a brand press conference. In his speech at the press conference, Tan Shuiyu, general manager of Volcano Engine, mentioned that Bytedance’s best technology should be released to the public, and the Volcano Engine video cloud product, which has served hundreds of millions of users, is one of them after being thoroughly refined by Douyin and Watermelon. What C-side advantages can the Volcano Engine video cloud continue from To C to B? How to use these advantages to provide better service and experience for customers? In the face of fierce competition in the market, how can it break through the tight circle? What else are we looking forward to in the future?

LivevideoStack caught up with Keith, product leader of the Volcano Engine video cloud, at Bytedance to talk about the Volcano Engine video cloud and what he thinks about the future of audio and video technology.

Development and application of audio and video technology

Audio and video technology has developed rapidly in recent years and has been applied to all walks of life to eliminate the distance between people. Especially during the epidemic last year, audio and video technology has brought great convenience to our lives, such as online classes, video conferencing, live shopping, video chatting… The application of these audio and video technologies enables people who cannot communicate with each other face to face to get together again.

Now and for a long time to come, audio and video technology will continue to penetrate into all walks of life. Keith said. “I feel very lucky, to stand on the track, and my colleagues, the volcanic engine video cloud products do more professional, more pervasive, and be able to in the process of the whole social information of continuous video, for customers from all walks of life and those who need the digital transformation of enterprises to provide services, they can more easily to application of audio and video technology, And grow their business very quickly from there.”

Keith says the future of audio and video should be more interactive and real-time. He believes that video conferencing, online classes, e-commerce live shopping and other areas will remain the hot areas for large-scale application of audio and video technology. For example, online shopping is no longer limited to product display in the past. Buyers can have a better shopping experience through real-time video communication with sellers. Online classes enable students in remote areas to gain access to excellent educational resources, eliminating the inequality caused by distance.

“Virtually every explosion in audio and video technology is about application scenarios. The scene is like a catalyst for the development of the audiovisual field.”

Volcano Engine Video Cloud has a “born with” video playback experience

At present, Volo Engine has launched A four-tier framework of unified basic services, technology middle platform, intelligent application, and industry solutions, including A/B testing, intelligent recommendation, Vecompass, flying link, growth analysis, video on demand, cloud editing… These products are all productifications of Bytedance best practices.

The video cloud is a service at the Volcano Engine middle platform level, and Keith believes that one of the platform’s biggest advantages is the inherent video experience. Behind this is the continuous polishing of Douyin playback technology by Byte, as well as the result of continuous iteration in user experience.

Keith gives a specific example. They have tested Douyin over a period of eight months over 100 times to optimize the self-developed player and decoding capabilities. In the view of other cloud manufacturers, this is basically impossible, but in the actual combat process of Douyin, such a large scene, these are the details and experience problems they need to solve, and these problems, in fact, all the audio and video APP to develop to a certain scale will inevitably encounter.

“We are constantly exploring the best video playback experience in big scenarios like Douyin and Watermelon, while addressing the large-scale problems that arise along the way. The solution to these problems, we will precipitate it into a methodology, and then integrate it into the Volcano Engine Video Cloud product, and then bring it to market. “Our customers, when they are faced with the same problem, can draw on some ready-made, real-world solutions to their problems.”

Keith says that other cloud vendors may not have as many people dedicated to this kind of development, operation, and data optimization alone, nor do they have such A large business scenario to do A/B testing, so it’s hard to hone A playback experience like the Volcano Engine video cloud.

Take the mobile terminal as an example. Douyin has many new requirements on the multimedia SDK, such as video preloading and pre-rendering, so it uses A/B tests to continuously optimize the product. When Byte’s tech classmates polished the experience to the best (they pioneered the “first frame zero” technology), they found that while the entire video cloud was a red sea, it was a blank space in the video cloud that no one had laid out, because other video cloud vendors had a hard time understanding this new requirement.

The atomization of product design

Of course, the process of commercializing Volcano Engine video in the cloud is not without its challenges.

Keith explains that this is because with customers at different stages of development and at different levels, they start to think about how to design a product that works for all of them.

“It’s really challenging for the product architect to design the entire product. To solve this problem, our architects will take the smallest pieces of the product and combine them into a solution within the same set of API and SDK architectures. Because different customers will use different parts of it, it’s important to keep the product pieces independent and coupled, and use workflows to link all the functions together so that different customers can meet their needs in one API.”

At present, Volcano Engine Video Cloud is gradually introducing SDK packaged according to customer requirements, and this atomized abstraction will be gradually opened to customers during the whole product marketing process. You’ll probably see hundreds of these product feature points, but they’re all available in a set of APIs.

“But that doesn’t mean we’re going to do customization. Our product features are standard. Customers just take the features that fit their business. I want our products to be flexible enough to match the needs of our customers, to be more flexible, and to reduce the cost of migration and use for our customers.” Keith emphasis.

He explained that SDK TOB is not easy to do from the perspective of the whole industry at present. There are two main reasons: first, the late service cost is too high, and the daily feedback from customers is very large. If the workload is too large, the whole team will become too busy to do the normal iteration; Second, the willingness of customers to pay is very low.

Why word energy saving to do? Keith thinks there are also two points: one is the large number of products in the internal service bytes, experienced business complexity is very large, we have done all kinds of enough business adaptations, whether TOB or not, have formed a set of common layer capability on this basis; The second is to pursue the ultimate corporate culture throughout the byte, which allows us to go really deep in technology.

From C to B

The positioning of the Volcano Engine video cloud team within Bytedance is also the middle platform of technology, and it will be concerned about many business problems of the C end. Therefore, when supporting C-end business needs, they will actively think about how to copy, promote and settle horizontally, and pay special attention to the technologies and methods that settle down to solve specific business problems, such as cost saving and experience optimization.

In fact, whether it’s Douyin, Toutiao or the watermelon video, there are many challenges in terms of scale and innovation. Keith says that while addressing these challenges, there is a lot of valuable technical experience that is proven, precipitated, and then opened up to the customer.

According to LivevideoStack, the B end of Volo Engine video cloud and the C end of Douyin are completely connected. They are basically the same in terms of ability and technical team members, which also enables them to more directly consider problems from the perspective of C end when doing B end services. This may be different from many cloud vendors, whose B side and C side are supported by different teams.

Another advantage of having the same team working on To C and To B, says Keith, is that they have a lot of ideas about how to play the product, which can help customers innovate in their business.

“Because the C-side is more concerned about attracting users and retaining them, which involves a lot of gameplay based on live streaming, Linemay and PK. And these gameplay, when we do B-side services, naturally formed a complete solution, and then output to the customer.”

Technology application in the next year or two

In the interview, LivevideoStack also chatted with Keith about the future of audio and video technology.

He believes that in terms of the audio and video experience, it’s going to be an overall iteration one level forward in a one to two year time frame. For example, in terms of live broadcast technology, the efficiency and synchronization of information transmission in the fields of e-commerce and education will be improved, including scenarios that can support large-scale classes and large-scale shopping. In addition, the H.266 ultralight compression technology that Volcano Engine has been exploring can also give you an improved look and feel. From the technical point of view, RTC technology will become a standard of the whole Internet APP, and will also become the basis and mode of the next generation of Internet communication standards.

How to improve the picture quality on the full link may be a trend if the technology is further dug. Volcano Engine Video Cloud has begun small-scale trials of the H.266 and is expected to be available within two years.

In addition, Volcano Engine continues to optimize the evaluation technology of full link picture quality, weakening at link A and then strengthening at link B, and this combination will become richer. Keith gives a specific example: “For example, in the production side of Douyin, due to the consideration of the submission rate, we can not give it too high bitrate, so the produced video may become blurred in the details of the hair, so it can be recovered in the later transcoding and mobile phone playback. Where to recover from and how to recover more reasonably are all areas where technology can dig deep, and through this series of combination, the final picture quality will still be clear to the consumer side.”

New scenarios for the future

In the process of communicating with some laboratories and foreign companies, Keith also saw something new. For example, 3D portrait projection can project people into another space from a distance and then chat with you face to face. There is also 3D environment modeling, which virtualizes a real environment and then projects it in front of a person to make it more real. In Keith’s opinion, it is to create a more immersive world along with the information carrying mode of video, which brings new ways of interaction, thus reducing the distance and cost between people.

While these technologies are still in the research environment, Keith says they’re looking for places where they can mature and then open up for a wide range of industries to use.

If you want to learn more about the Volcano Engine video cloud technology, please check out the Volcano Engine special session on LiveVideoStackCon2021 in Beijing on September 3rd to talk to the senior technical developers of Volcano Engine about audio and video technology.

Guest: Keith, Volcano Engine Video Cloud Product Lead

Interviewer: Yan Bao, editor-in-chief of Livevideo Stack

Editor: Alex


The Audio and Video Technology Behind Bytedance Revealed

Audio and video technology has been developing rapidly in recent years. On the one hand, it meets the needs of enterprises for rapid business growth, and on the other hand, it creates more possibilities for business development. In this installment, we will demonstrate the audio and video technologies behind Bytedance and how they can be leveraged to support business growth and meet the needs of our partners. This share will start from audio and video codec, review audio and video codec technology and outlook, and introduce the optimization and evaluation of video codec; Then, we will introduce the application of audio and video in live broadcasting and how to support the growth of the business through audio and video. Finally, we will take Douyin as an example to introduce how RTC technology pursues the ultimate experience.

Please scan the picture for detailsQr codeOr clickRead the originalSign up for special events.