Recently, the “GOTC Global Open Source Technology Summit”, co-hosted by open Atom Foundation and Linux Foundation Asia Pacific and Open Source China, was successfully concluded at the Shanghai World Expo Center. Leaders from the world’s top open source foundations, open source community participants, and open source enthusiasts from around the world. Each open source industry experts to open source as the topic, with the conference as the media, launched a profound industry trend discussion.

As a representative of outstanding open source projects in China, Yang Bing, CEO of Ant Group OceanBase, shared his thoughts on open source infrastructure in the past few years at the summit, and reviewed the development history of OceanBase. The importance of open source and open infrastructure construction is summarized.

Stick to the original faith of open source

Yang Bing introduces, oneself follow the predestination of open source technology long-standing. In 2009, Yang Bing joined Ant Group and participated in the research and development of SOFAStack, the largest distributed architecture of the financial system in the Group. He became the leader of the middleware team and led the team to continuously explore the technology commercialization of Ant Group. And open up infrastructure like OceanBase and SOFA to the outside world. He believes in “open source infrastructure” and actively promotes the open source work of some distributed architecture components within Ant Group. In 2020, Yang Bing officially joined the OceanBase team and actively promoted the open source of the project together with Professor Yang Zhenkun, founder of OceanBase.

OceanBase is a native distributed database. The word “native” is indeed borrowed from “native” in the “cloud native” that has been popular in recent years, and the idea is the same. “We want to keep the simple for application development, keep the complex for ourselves, and solve the distributed problem within the database.” Yang Bing said.

A good database is used

When asked what is the difference between OceanBase and other databases, Yang Bing believes that there are three big differences: First, From the first line of code, OceanBase conducted 100% self-research, and no database in the market at that time regarded distributed model design as a first-class citizen. “We saw that this was the future, so we chose to stick with native distributed relational databases from day one. By opening more than 3 million lines of code on June 1, our first anniversary, we want to share the technology we’ve developed over the last decade to help people explore the distributed world.” Second, good infrastructure software is not designed, but iterated through practice, which OceanBase did. Yang bing said OceanBase is currently the only database in the world that has been tested and operated for a long time in large-scale financial scenarios. The financial scenario is the most critical and important scenario for basic software. Yang bing and his team have participated in nearly 10 Singles’ Day wars in 11 years. They are veterans of singles’ Day. OceanBase has withstood all kinds of severe tests during the “war”. Over the past three years, the OceanBase team has also been working with external partners and customers to explore more external usage scenarios. Finally, OceanBase successively participated in the TPC-C international benchmark test organized by the authoritative organization TPC, and broke the world record to become the top. OceanBase has withstood the test of international standards, and also proved that distributed technology can be very well combined with database. And continuous breakthrough and development. Yang bing expressed his hope that OceanBase could devote itself to becoming a global, enterprise-class database technology service provider in the future. “Database is a very high-barrier basic field, and we want to be able to raise Chinese voices and build Chinese brands in this field.”

The right place at the right time created OceanBase

In the 1960s, database technology pioneer Charles Bachman proposed the mesh database model and led the DEVELOPMENT of the DBTG report, which revolutionized that modern information systems should be “database-centric”. In 1970, E.F.Codd, the father of relational database, published the paper “Relational Model of Data in Large Shared Database”, which for the first time clearly and clearly proposed a new model for database system — relational model. It described the physical world in the way of relational two-dimensional table, making database technology more popular. It also laid a solid mathematical theoretical foundation for the later database technology.

In the 1980s, James Gray came along and in the midst of the relational model database boom, he defined the standards for the transaction model, making the database truly the basic software capable of handling mission-critical applications.

Michael Stonebraker is a Turing award winner who has made significant contributions to database technology. He has been involved in the design of many databases such as Ingres and ProgreSQL. He is also the founder of various database companies and has created many different types of database products.

Looking back at the development of the entire database, driven by generations of great Turing award winners, the cycle from theoretical breakthrough to practice has not stopped.

Yang Bing thinks, database development encountered two huge challenges.

In recent years, with the rapid development of science and technology, the process of digitization of human society is accelerated. Today’s database is bound to face massive data storage and requests. The second problem is that the transition from PC Internet era to mobile Internet allows people to use mobile phones to enjoy all services anytime and anywhere, which puts forward higher requirements for high availability of business.

In the process of eleven years of development, ali’s internal database technology has also experienced three eras.

In the first era, faced with massive data and requests, Ali unswervingly moved towards the direction of distribution, which was also applied and innovated in some large-scale scenarios such as Taobao, Tmall and Alipay.

The second era is the original distributed stage. OceanBase gradually transferred from Taobao of Alibaba Group to Alipay, because only financial scenarios with higher requirements on database can truly improve and exercise the performance of database. However, OceanBase is faced with a mixture of finance and Internet, which requires the maximum performance of database. OceanBase is also constantly innovating and breaking through, from the traditional reserve to multiple copies, to two places and three centers, to three places and five centers… OceanBase has been evolving in a distributed direction and now has city-level disaster recovery capabilities.

In the third era, OceanBase gradually extended from the extreme OLTP scenario to the direction of data analysis, and entered the HTAP era of AP and TP fusion. The entire deployment architecture is also moving from private cloud to hybrid cloud to multi-cloud deployment. More importantly, OceanBase slowly moved from the internal use scenario of Alibaba to the broader external common scenario, becoming more open. That’s why the team decided to open source OceanBase.

Yang bing said that if the technological ecosystem inside ants is regarded as a small community, they have also accelerated the development of the entire infrastructure inside through open source. In essence, OceanBase was born under the specific application scenarios of Ant and Alibaba, and developed iteratively to cope with the challenges of different times, including how to deal with infinite expansion, traffic peak and high availability, etc. OceanBase also solved the problems of distributed scalability, high availability, and disaster recovery in different periods, standardizing the underlying infrastructure so that it could be invested in the upper layer for innovation and promoting the development of the entire infrastructure.

Today, OceanBase is increasingly used in common scenarios, including communications, transportation and more. OceanBase is now expanding outward, and open source can attract more partners to build and mature.

Open source, the best way to build infrastructure

At the summit, Yang Bing also threw out his thoughts and views on infrastructure construction.

The evolution of infrastructure software, whether open source or closed source, will eventually survive and thrive, and will inevitably lead to standardization and scale. From Linus, who built Linux as an open source infrastructure shared by developers all over the world, to Iaas layer infrastructure services, the development of the entire technology has to wait until the underlying infrastructure has formed a certain standard and scale before it can flourish. This standard can be developed more quickly through open source.

There are plenty of cases in the process of forming standards in PaaS, big data and other technical fields, as well as in the development process of OceanBase itself. For example, in the field of micro-service communication, GRPC established through open source has become a de facto standard; K8s has become the de facto standard for cloud native infrastructure construction. It is because of these open-source software that has become standard that developers do not have to spend their energy on compatibility, and the entire infrastructure can continue to iterate forward. “Open source is the best way for infrastructure to achieve standardization and scale,” Yang said.

In addition, open source also increases the speed of infrastructure iteration. The core competitiveness of a piece of software is not how powerful it is at the moment, but how fast it can iterate. Taking OceanBase as an example, how to maintain the pace during the eleven years of rapid internal development and keep the speed of iteration when moving to a broader space are mainly composed of three elements:

First of all, openness is very important. OceanBase has changed from internal proprietary scenes to external common scenes, and it needs to interact with more upstream and downstream software. In such a way, if it develops from a closed source, it can only go out, but cannot go in from the outside. As a result, the docking speed between OceanBase and the whole ecology will be slow.

Then there is the scene. As the basic software develops to such a complex scale, it needs to go through the polishing of many scenarios. If you want to become a more general software, it is bound to need broader scenarios and open source, which can enter into different fields more quickly and continuously polish the product.

Finally, academics. Over the years, there have been many academic exchanges with the OceanBase team. Discusses how to combine distributed technology and database technology, and challenges in the research and development process. Although OceanBase has accumulated a lot of engineering practice experience through the spiral of “theoretical breakthrough + engineering practice”, the OceanBase team has also encountered many theoretical bottlenecks since 2010. Only through open source, can the mode of combining industry, university and research be formed more quickly, so as to promote the development of the whole field in a healthy way.

In the past year, affected by the epidemic, people’s life has become very inconvenient, but it has also rapidly promoted the trend of digitization in various industries around the world. More scenes have been moved online for digitization, which cannot be separated from infrastructure software. Open source is the best development way for infrastructure to maintain its core competitiveness and to iterate quickly. It is a necessary means for infrastructure software to improve its competitiveness and adaptability.

Yang Bing said that OceanBase will firmly continue on the road of open source, hoping to make China’s voice heard in the open source world through everyone’s efforts.