Introduction: This paper summarizes the wonderful speech of Mr. Jia Yangqing, vice president of Alibaba Group and President of Alibaba Computing Platform Business Division, in the special session of open source big data, and will tell you the ecological outlook of open source big data and artificial intelligence, share aliyun’s attitude towards open source, and the future planning of big data and artificial intelligence.

The 2019 Alibaba Cloud Summit · Shanghai Developer Conference was grandly opened on July 24. The summit shared with the developers of the future world the technical dry goods in the fields of open source big data, IT infrastructure cloud, database, cloud native, Internet of Things and so on, and jointly discussed the cutting-edge technology trends. This paper summarizes the wonderful speech of Mr. Jia Yangqing, vice president of Alibaba Group and President of Alibaba Computing Platform Business Division, in the open source big data special session, and will tell you the ecological outlook of open source big data and artificial intelligence, share Aliyun’s attitude towards open source, and the future planning of big data and artificial intelligence.

Expert profile: Jia Yangqing, Vice president of Alibaba Group, president of Alibaba computing Platform Business Division. He was director of AI Architecture at Facebook, where he was responsible for leading EDGE AI platform development, AI platform support across Facebook’s product groups, and leading edge machine learning systems. He has worked as a research scientist at Google Brain and has made many contributions to deep learning framework. He is a Caffe author, a TensorFlow co-author, co-leader of Pytorch 1.0 and founder of Onnx.

Open source Big data PPT download

The content of this article is compiled from the speech video and PPT.

First, open source status quo

So far, the development of AI open source projects can be said to have taken root. From Caffe (2013) to Tensorflow (2015), which is considered the most popular ai framework for large-scale commercial applications by Google, to PyTorch 1.0 (2017), which can be used for more flexible research and deployment environments by Facebook. It can be said that the development of ARTIFICIAL intelligence in recent years is inseparable from the open source community spirit of open source sharing. From a global perspective, the entire open source community is thriving. According to GitHub’s 2018 Open Source annual report, there are 31 million users actively developing software on GitHub, and about 2.1 million organizations around the world have built open source projects with about 96 million repositories. In 2018, GitHub added more active users than in the previous six years combined, 40% more organizations and 30% more code repositories than in 2017. Among them, domestic developers also participate in many open source projects, from the bottom system to the upper application, from the global trend, open source is undoubtedly the general trend of software development.



The following figure is the research report on the procurement of big data software by Enterprises of China Information and Communication Institute. From the perspective of enterprises, 53.9% of enterprises choose the commercial version of open source software, 32.7% choose the community version of open source software, and a total of 86.6% choose to build their big data processing business based on open source software. It is not difficult to find that the current domestic open source development is consistent with the global trend.



2. Ali Cloud’s attitude towards open source: embrace, contribute and win – win

Ali Cloud is an independent and controllable cloud, but also open source compatible cloud. In the past ten years, the most proud point of Ali Cloud is to establish a series of software stacks from large-scale integrated management, integrated control, resource optimization, big data solutions at the bottom to the upper business platform. On the other hand, Ali Cloud applies a lot of open source runtime and projects in the whole autonomous and controllable system. These open source runtime and projects serve as different building blocks to help Ali Cloud build the entire autonomous and controllable building.

Embrace open source

Many of alibaba’s businesses use open source software. In addition to alibaba’s internal businesses such as Taobao, Tmall, Alipay, AliExpress, Cainiao, Juhuasuan.com and Ali Cloud, The Iass and Pass services provided by Ali Cloud to users also refer to and use open source projects, such as Linux, Hadoop, Flink and the latest artificial intelligence frameworks Caffe and TensorFlow. Alibaba is very grateful to the open source community and is eager to embrace it.



2. Contribute open source

While embracing the open source community, Alibaba is also constantly contributing its basic strength to the open source community. More and more front-line students in Ali are investing in open source projects and contributing their own strength. As you can see, when open source applications are combined with enterprise business processes, there are many problems that may not have been considered in the original open source environment. A lot of times, open source projects start out as a developer’s idea, and a lot of deep thinking and clever design is put into the design, architecture, and development process to build a system, and the actual business practice can be tempered and feedback on the open source project design. At present, Ali is the most outstanding enterprise contributing open source in China. There are a large number of open source projects created by ali on GitHub. According to the statistics of GitHub open source ecological report of ali economy, there are 6 open source projects of ali among the Top10 open source projects in China.



In the field of big data and artificial intelligence, Alibaba has contributed more than 1 million lines of code to the open source community so far. More and more excellent Ali engineers are gradually being accepted by the open source community. At the same time, the open source community also invited ali students to participate in the discussion on the development direction of open source project. To date, Ali has trained over 50 community committers and PMCS, from the bottom of the ORC project to Spark, Flink, etc. Moreover, the optimization of more than 10 products and projects based on open source in Ali has achieved very good results, and the optimized projects have been improved compared with the open source version in terms of speed, availability and stability.



3. Win-win open source

Open source software is inseparable from the control of economic laws, that is to say, open source needs to provide value. Today, many enterprises are choosing to migrate their infrastructure to the cloud. The cloud is a great medium to help open source software realize business value and align it with enterprise business scenarios. Ali Cloud’s huge business volume can be a testing ground for contributing technologies to the open source community. For example, Flink is a streaming computing framework and a mainstay of Ali Double 11. Ten years ago, when Alibaba started double Eleven, the business volume was very small and the pressure of business on the system was not too great. Since 2016, the number of alibaba Double 11 users has reached hundreds of millions. Users’ large-scale purchase, browsing and query operations have led to a sudden increase in the background index at 0 o ‘clock. At the same time, Alibaba has come to realize that most open source projects are not designed to address the application pressures of such a large volume. Therefore, Aliyun has implemented a lot of optimizations based on open source projects to meet its massive business requirements. When it comes to artificial intelligence, Ali has found something similar. The previous generation of AI frameworks, such as Caffe, often took the shadow of academia into their project design, but after years of industry and academia, the new generation of AI frameworks, such as Tensorflow and PyTorch, have gradually begun to consider industrial scale, flexibility, high performance, deployment of multiple environments (on the end, on the cloud, On the phone, etc.). These exercises happen to provide very good feedback and contribute to the open source community.



Ali Cloud big data and artificial intelligence open source cloud products

In terms of big data and artificial intelligence, Alibaba Cloud’s main contributions to the open source community are as follows: Real-time computing: Flink supported Alibaba’s real-time computing tasks in the Double 11 period. PAI component: PAI is a deeply optimized platform based on the open source PyTorch and Tensorflow frameworks. PAI is fully compatible with Tensorflow and PyTorch syntax. In the distributed training and deployment of models, PAI can realize faster training and larger deployment by optimizing the bottom layer, communication library, GPU and architecture. EMR: In terms of big data, not only Flink stream computing, but also traditional Hadoop, Spark and other products. Ali Cloud ElasticMapReduce (EMR) platform, based on open source big data collection service, can perfectly connect with the open source computing mode encountered in big data scenarios. At the same time, it helps offline open source users move seamlessly to the cloud. ElasticSearch: Alibaba Cloud is a platform to empower users and achieve business value. Ali Cloud has a great partnership with ElasticSearch. Provided by the founding team of ElasticSearch, Alibaba Cloud provides ElasticSearch products for users to solve a range of issues including platform, management and deployment. Such partnerships make it easier for open source software to be deployed and help the ecosystem continue to grow.



Coexistence, symbiosis and win-win of Ali Cloud and open source community.

How to deploy open source projects into real applications? Most enterprises and developers are increasingly moving their projects to the cloud. From an individual perspective, developing on the cloud is very convenient, and from an enterprise perspective, it makes it easier to deploy across geographies and internationally. At present, AliYun serves 2.3 million customers in 18 regions and 49 available regions, indirectly providing cloud computing, big data and artificial intelligence computing power to billions of users and helping customers deploy their applications and products. The cloud computing capability provided by Alibaba to the open source community has long formed a strong bond of coexistence, symbiosis and win-win with the open source community.



Big data & future planning of artificial intelligence

Support group, service on cloud: Alibaba Cloud, the base team of Alibaba Group supports both internal group applications and applications on cloud. Alibaba Group is the largest user of Ali Cloud. We believe that the technologies and products provided by Ali are verified and reliable under the condition that they can support such a large number of users.

Giving back to the community and creating an ecology: After in-depth cooperation between Ali and Flink community, Ali integrated Blink project and Flink and gave back the combined version to the community. Ali has also gained a lot of experience in interacting with the open source community.

Community building, win-win business: at present, more and more domestic developers have great enthusiasm for open source. Ali Cloud hopes to provide more services for these developers, such as how to help developers do CI, how to do better testing, how to achieve better code hosting. At the same time, Ali Cloud hopes to further help enterprises to communicate more effectively with the open source developer community through open source conferences and activities of open source developers, and promote the development of the open source community and open source technology.


The original link

This article is the original content of the cloud habitat community, shall not be reproduced without permission.