While researching how to make video calling more efficient and easier to scale in the future, Facebook realized that the best way to do this was to redesign the library from scratch and rewrite the entire library, known as RSYS.
By Ishan Khot
The original link/https://engineering.fb.com/20…
- We will launch a new video calling library across all related products across our apps and services, including Instagram, Messenger, Portal, Workplace Chat, and more.
- Creating a common class library is sufficient to support all of these different use cases, but we need to rewrite the existing library from scratch using the latest version of the open source WebRTC library. It was such an incredible task that our entire company of engineers worked on it.
- Compared to the previous libraries, RSYS supports multiple platforms, including Android, iOS, MacOS, Windows, and Linux. Its size is about 20% smaller, which makes it easy to integrate into platforms with limited size, such as Messenger Lite. RSYS has about 90 percent unit test coverage and a complete integration test framework that covers all of our major invocation scenarios.
- We did this by optimizing the size of the library and architecture to binary size by breaking up the parts needed for the invocation into separate, independent modules and leveraging a cross-platform solution that is independent of the operating system and environment.
The initial version of Facebook’s video calling was written on a seven-year-old branch of WebRTC specifically designed to enable native audio calling in Messenger. At the time, our goal was to provide our users with the most feature-rich experience possible. Since then, we’ve added video calling, group calling, a video chat engine and interactive AR effects. Millions of people use video calling each month, and this full-featured library, seemingly simple on the surface, becomes much more complex behind the scenes. We have a lot of Messenger specific code, which makes it difficult to support applications like Portal and Instagram. We have separate signaling protocols for group calls and peer calls, which requires us to write the characteristics twice and creates large inconsistencies in the code base. We also spent more time updating the WebRTC branch and using the latest improvements of open source. But in the end, we found that we were falling behind in providing reliable service for low-power devices and low-bandwidth scenarios.
While researching how to make video calling more efficient and easy to scale in the future, we realized that the best approach was to redesign the library from scratch and rewrite the entire library. The result is RSYS, a video calling library that allows us to take advantage of some of the major advances we’ve made in video calling since we wrote the original library in 2014. RSYS is about 20 percent smaller than previous versions, and the platform can be used in all development. With this new iteration, we will reimagine our view of the video calling platform and start from the ground up with a new client core and extensibility framework. This helps us advance our state-of-the-art technology, and the new code base is designed to be sustainable and extensible for the next decade, laying the foundation for remote presence and interoperability across applications.
Faster and more small
Using a smaller codebase can load, update, and start up faster for its users, regardless of device type or network conditions. Smaller libraries are also easier to manage, update, test, and optimize. By the time we started thinking about preparing the new version, our peak binary size was up to 20 MB. Although we could reduce some of the content by editing some of the code snippets, to get the effect we wanted, we realized that we needed to rewrite the entire code base from scratch.
The easiest way to get a smaller library is to get rid of many of the features we’ve added over the years, but it was important for us to keep all of the most commonly used features (such as AR effects). So we took a step back and looked at how we can apply what we’ve learned over the last decade and what we’ve learned about the needs of the people who use our products today. After exploring our options, we decided that we needed to go beyond the interface and delve into the infrastructure of the library itself.
We made several architectural choices to optimize size, introduced a plug-and-play framework, selectively compiled features to applications that need them, and introduced a generic framework to write new features based on the Flux architecture. We’ve also moved from templated general-purpose libraries like Folly to better-sized libraries like Boost. SML implements size gain in all applications.
In the end, we reduced the size of the core binaries by about 20%, from about 9MB to about 7MB. We did this by reconstructing our features to fit a simplified architecture and design. While we have retained most of the features, we will continue to introduce more pluggable features over time. Fewer lines of code make a code base lighter, faster, and more reliable, and a leaner code base means engineers can innovate faster.
One of the main goals of this effort is to minimize code complexity and eliminate redundancy. We know that a unified architecture will allow global optimization (rather than having each feature focus on a local optimization) and allow code reuse. To build this unified architecture, we made some major changes:
- Signalling: We propose a state machine architecture for the signaling stack that unifies peer invocation and group invocation protocol semantics. We were able to abstract any protocol-specific details from the rest of the library and provide a signaling component that would be solely responsible for negotiating shared state between calling participants. By reducing duplicate code, we can write features once, allow easy protocol changes, and provide a unified user experience for peer and group invocations.
- Media: We decided to reuse our state machine architecture and apply it to the media stack, but this time we captured the semantics of the open source WebRTC API. At the same time, we are also working to replace our branch version of WebRTC with the latest version, while retaining all product-specific optimizations. This allows us to change the WebRTC version while in the state machine, and we can set up regular pulls from the open source code base as long as the semantics of the API itself do not change significantly. This enables us to easily update to the latest features without any downtime or delay.
- SDK: In order to have function-specific state, we use a Flux architecture to manage data and provide APIs for calling products that work similar to the React JS-based applications that web developers are familiar with. Each API call results in a specific operation that is routed through the central scheduler. These actions are then handled by a specific Reducer class and emitted model objects based on the type of the action. These model objects are sent to the bridge that contains all the function-specific business logic and cause subsequent actions to change the model. Finally, all model updates are sent to the UI, where they are converted into platform-specific view objects for rendering. This allows us to clearly define a feature that includes the reducer, bridge, action, and model, allowing us to configure features for different applications at run time.
- OS: In order to make our platform universal and extensible, we decided to abstract away all the functionality that is directly dependent on the OS. We know that for some functions (such as creating hardware encoders, decoders, thread abstractions, etc.) you have to have platform-specific code for Android, iOS, etc., but we try to create common interfaces for these functions so that platforms like MacOS and Windows can easily plug in by providing different implementations via proxy objects. We also make heavy use of the CXX_Library in Buck to configure platform-specific libraries in an easy way for compiler flags, linker parameters, and so on.
The RSYS framework for the call
The next step
Today, our invocation platform is significantly smaller and scalable across many different use cases and platforms. We support telephones that millions of people use every day. Our library is part of all our major calling apps, including Messenger, Instagram, Portal and Workplace WeChat. RSYS is a long process to build, but it doesn’t feel much different to the people who use these applications. It will continue to provide people with a great phone experience. But this is just the beginning.
The work we’re doing in RSYS will allow us to continue to innovate and extend our calling experience as we move into the future. In addition to building a library that will be sustainable for the next decade or more, this work laid the foundation for cross-application invocation of all of our applications. It also lays the foundation for us to build an environment centered on a remote presence.
This work benefited from working with the customer platform team. We are grateful to all who have contributed to RSYS, especially Ed Munoz, Hani Atassi, Alice Mengg, Shelly Willard, Val Kozina, Adam Hill, Matt Hammerly, Ondrej Lehecka, Eugene Agafonov, Michael Stella, Cameron Pickett, Ian Petersen and Mudit Goel provided assistance in implementation and continue to provide guidance and support.