Disclaimer: This report is from Open source, and has been reprinted with authorization from open Source. Due to limited space, some excerpts have been selected. Please download the full report in PDF.
preface
Writing this preface, our feelings have been very complicated, it is difficult to use a certain vocabulary to summarize. This is the best of times, this is the worst of times.
Years from now, we may look back on 2020 and call it the beginning of a global upheaval. Many friends in last year, even “every day in the history, every day is live long see!” The COVID-19 pandemic and the TRADE war between China and the United States intensified in the same year. And because of the pandemic and the trade war, the world is changing at an accelerating pace.
Against this historical backdrop, we also see three trends in the open source community:
1. Open source development and the trend from real to virtual
From the data, and from our own perceptions, there is a big trend towards open source around the world. GitHub’s active repository and active users are growing at a high rate (35.3% / 21.2%). Gitee’s repository and users are growing at an even more impressive rate (192% / 162%).
On the one hand, open source has been on the rise for years. And on the other hand, it is one of our speculation: since the outbreak, more and more people begin to telecommuting, in fact promoted the more and more people have the opportunity to “slash”, which is in front of a computer to easily switch identities, deal with various affairs in an asynchronous way, thereby increasing the developer time and opportunity to participate in the open source.
And, of course, thanks to telecommuting, the virtual world has become a much bigger part of our lives. Is it better, and what are the problems? As human beings at the mouth of tunnels, it’s impossible to guess.
2. The rise of Open source in China and the fragmentation of the open source world
With the increasing influence of more and more Open source projects in China, the launch of the Magnolia Agreement, the establishment of the Open Atom Foundation, the rapid growth of Gitee, and the new release of CODE China, we are already convinced that 2020 is the year of the rise of open source in China. Those of you reading the report should see a lot of hard evidence.
However, another remarkable phenomenon can also be seen in the following data. There is no overlap between the most active Chinese open source projects on GitHub and the most active Chinese open source projects on Gitee.
As Gitee continues to grow rapidly, it is safe to predict that more and more high-quality Chinese open source projects will choose to open source on Gitee. “One World, Two Systems,” once mentioned by a friend, will gradually become a reality.
If China’s open source “rises” in a way that is isolated from the rest of the world, this is not the future we want to see.
3. Open source for good and the fact that we’re not ready
Since the outbreak, many open source practitioners have dedicated their time, energy, technology and wisdom to open source projects related to the fight against COVID-19. As a result, many open source organizations and projects for medicine, epidemic prevention, public welfare and relief have been born. Wuhan2020 is a typical representative.
Therefore, we chose “Open Source for Good” as the theme of the 2020 China Open Source Conference without hesitation, which was unanimously approved by many lecturers, participants and sponsors.
But what’s next? How to organize, encapsulate and modularize the people, projects, experiences and lessons gathered in the fight against the epidemic and prepare for future contingencies? These are areas that deserve long-term thinking and improvement.
As a complete, objective, comprehensive and rich report, it is worth discussing more than the above, of course. You are welcome to read the report below and communicate with us at any time.
— Zhuang Biaowei, president of open Source Society
January 16, 2021
Guest review expert
- Wu Sheng, Apache Software Foundation Member, co-founder of Apache Local Community
- Dongxu Huang is the co-founder and CTO of PingCAP
- Ma Yanjun is a senior researcher at Baidu’s Natural Language Processing department
- Tao Jiang is the founder and chairman of CSDN and a founding partner of Geek Bang Venture Capital
- Yang Gao is the founder and CEO of SegmentFault
- Sweet Potato, founder of Open Source China & CTO
- Zhuang Biaowei, president of open Source Society
- Chen Yang, vice president of open Source Society
- Wang Wei, CEO of Open Source
- Liu Tiandong, director of open Source
I. Questionnaire
1. Background of report
At the beginning of 2016, THE open Source Society released the “2015 China Open Source Community Participation Survey Report”. In the following years, the open Source Community has continuously released the developer survey report, aiming to present the development of open source in China from various dimensions. This year, we set out to map the open source world in China in 2020 by combining data analysis methods and survey reports.
This questionnaire is an important part of the annual China Open source report, and the analysis report without research is just an empty talk. Based on previous years, we looked at other existing major developer questionnaires and added some new perspectives. In the context of COVID-19 2020, the survey takes a closer look at online collaboration in the open source world and what open source can bring to the world.
Through the statistical investigation and analysis of nearly 60 questions, we hope to restore the real situation of the current Open source community in China, so as to provide authoritative reference for the successors of open source.
- Target audience: developers, community members, contributors, students, government and business managers
- Survey content: mainly covers personal information, work status, open source community and developer skills
- Survey methods: Samples and data were collected by online questionnaire, and data were analyzed by cross-comparison method
- Promotion methods: online social media, blog, open source community, open Source Chinese website
- Number of questions: 59
- Type of question: single choice, multiple choice, open
- Sample size: 236
2. Important findings
Due to the epidemic, this year’s questionnaire was all spread online, so the sample size collected was smaller than in previous years, but it was still able to get a glimpse of the quiet changes in China’s open source world. By analyzing the 2020 data and comparing it with previous years and other publicly published statistical reports, we have some key findings:
- The majority of open source participants are still 20-39 years old, which is still increasing compared to last year. In the open source community, the age distribution of 20-30 years old accounts for 64%, indicating that the age of developers is still getting younger.
- Compared with 2019, this year to participate in the research of the developer, has not been accounted for most of the people who work and can be seen from the participants engaged in the field and position distribution, these “not work” people most point to students, and the current social common pursuit of a higher degree, and the questionnaire form, But it also shows that open source is making good progress in education;
- The technical direction of developers is still the former back-end, of which back-end development accounts for about 31%, followed by Web front-end, AI and big data analysis, accounting for 10%, 9% and 9% respectively. Non-technical personnel and practitioners from other industries account for 7%, indicating that open source has been attracting more and more attention from all walks of life.
- Participation in open source has shifted from code and testing in 2019 to code and documentation. Communities and projects are realizing the importance of documentation, and more open source contributors are investing in documentation;
- The participation frequency of open source activities has increased compared with previous years, which is also related to the increasingly prosperous open source atmosphere and gradually rich open source activities in China. 81% of developers believe open source activities are critical to promoting and advancing the open source community. For meetings that were more online or offline, the results were surprisingly even;
- In the context of COVID-19 2020, it is inevitable that all industries will be impacted and affected; 82% think the epidemic has had an impact on participation in open source community activities, but it has also promoted them, 11% think it has had no impact and 7% think the epidemic has had more negative impact than positive impact.
- In the use of online document collaboration tools, graphite document has become the first from the third place, overtaking Wiki and Google Doc. A number of excellent collaboration tools are emerging in China and are more and more recognized by everyone.
- On September 9, 2020, The OpenAtom Foundation, The first open source software Foundation in China, officially announced its announcement. Promote the concept of open source, open source culture education and community construction, establish open source ecological system, carry out early incubation and support for open source projects, help The Chinese open source community to integrate and optimize resources, and connect with the international open source community.
3. Developer community characteristics
3.1 Participants are generally young and highly educated, and 80% are male
About 84 percent of the participants were men and 16 percent were women, unchanged from last year.
[Expert comments]
Liu Tiandong: The percentage of women participating in open source is similar to the survey data of the past 2-3 years, but higher than the international average (about 10-12%). We hope that in the future, more women will participate in open source in China and form a beautiful open source landscape.
Chen Yang: Diversity has always been the watchword of the open source community. The global open source community started to call for “encouraging women to participate in open source” more than a decade ago. Every year at the China Open Source Annual Conference (COSCon), open source societies have a tradition of holding women’s forums. At COSCon 2020, we invited women leaders in the open source community to connect the community with their unique abilities and perspectives. GitHub COO Erica Brescia, who oversees a community of more than 50 million programmers; Stormy Peters, executive director of the GNOME Foundation and now director of Microsoft’s Open Source office; Hong Phuc Dang, who is from Vietnam, Asia, has been involved in open source since founding FOSSASIA and currently serves as the BOARD VP of OSI. The breadth and depth of women’s participation in open source is moving in the direction we want it to, making the world interesting and diverse.
3.2 Working time and industry
Among the participants, the number of people who have not taken part in work is the largest, accounting for about 36%, followed by those who have been working for 3-5 years, and those who have been working for 6-10 years, accounting for 19% and 15% respectively, and about 20% of those who have been working for more than 10 years.
Among the participants, nearly 70% are engaged in the field of Internet development/software development, followed by those in the field of education/academic/scientific research, accounting for about 15%. Finance, banking, media, advertising and entertainment are also actively participating in open source.
3.3 Job Distribution
The majority of participants are middle school students and developers, with students accounting for 37 percent and developers accounting for 36 percent compared to 2019.
3.4 Technical direction
Participants mainly engaged in back-end development, accounting for about 31%, followed by Web front-end, AI and big data analysis, accounting for 10%, 9% and 9% respectively. Non-technical personnel and practitioners in other industries accounted for 7%, indicating that open source has been attracting more and more attention from all walks of life.
[Expert comments]
Yang Chen: In the early days of open source, open source = Linux. As the main battlefield of open source projects at that time, Linux operating system, desktop office software (GNOME, OpenOffice) and browser (Mozilla) set off the first wave of open source; In recent years, with the rise of the Internet, more and more open source projects and technologies are blooming. From database to middleware, from front end to back end, from programming languages to compilers, from Internet of things to micro-services, from big data to artificial intelligence, open source technologies and projects are increasingly rich and diversified.
4. Open source work status
4.1 Time to contact open source
Nearly 30% of participants have been exposed to open source for 1-2 years, and nearly 80% have been exposed to open source for more than one year.
[Expert comments]
Chen Yang: The open source community is very stable and has a kind of cohesion like a big family. The open source community of a dozen years ago was active among a small group of early open source people. I’ve observed that most of the open source people around me have been around for more than 10 years and will continue to be involved in open source in the future. The data of 2020 shows that there are 30% new participants, which also confirms from another Angle that the open source movement has taken on a status of being out of the loop after years of development.
4.2 Time commitment in open Source
About 44% of participants spend less than 5 hours per week on open source, and about 40% of participants spend 5-20 hours per week on open source.
[Expert comments]
Wu Sheng: The low proportion of professional open source developers (contributors who work more than 20 hours a week) shows that Chinese enterprises still focus on open source projects mainly at the user level. Most open source contributors still use business time or work breaks for limited contributions and participation.
4.3 Open Source Activities
The vast majority of participants only participate in online/offline open source activities a few times a year, and about 20% participate in online/offline open source activities once or twice a month. Offline open source activities are mainly salons and lectures, while online open source activities are mainly online meetings, mailing list discussions and PR activities.
Eighty-one percent of participants considered open source activities to be essential to promote and advance the open source community.
[Expert comments]
Gao Yang: We encourage developers to actively participate in open source activities, especially offline activities. Offline meetings and communication help build trust between people, which is of great significance to the prosperity of communities and more efficient collaboration between people.
About open source of online or offline activities/meeting, the participants views, offline activities can communicate face to face communication more efficient, better atmosphere, there can be more go out and have a look, the opportunity to make new friends, and online activities is more safe, convenient, not affected by time and geographical constraints, such as low cost, and can video conference contents, Convenience post-review, of course, offline video is also increasingly common.
The year 2020 is a very special year, with the COVID-19 pandemic affecting all sectors of life to varying degrees. 82% of participants believe that the pandemic has affected, but also promoted, participation in the open source community.
[Expert comments]
Liu Tiandong: The impact of COVID-19 has accelerated the participation of more telecommuting communities in open source activities. Take the “5th China Open Source Annual Conference + Apache China Road Show” held by Open Source Society on October 24-25, 2020 as an example. More than one million people participated online, which is a thousand-fold increase of the number of people participating in offline activities in the past. At the same time, more international and domestic open source celebrities shared online than in previous years. While the conference also hosted offline gatherings in five cities, we can expect online meetings to become the norm in the future.
4.4 Open source Revenue
It can be seen that many people do not pursue material returns in participating in open source. 30% of participants have no income from open source, but they are still keen on open source work. At the same time, we can also note that 12% of participants say that their enterprises will pay their employees to participate in open source full-time or partially. 23% of the participants said that school scientific research projects or associations support participation in open source, which shows that enterprises and schools are gradually paying attention to and attaching importance to open source.
[Expert comments]
Wu Sheng: The proportion of people who can earn income from open source is highly consistent with the proportion of people who participate in open source for more than 20 hours, which well shows the significant change of business value in the intensity of open source contribution. While contribution time and revenue are not indicative of the quality of open source contributions and open source projects, high-quality projects require a certain amount of commercial support for a virtuous cycle.
Chen Yang: 12% of participants earn part-time or full-time income from open source. Open source evangelist, open source operation manager and open source developer are all popular occupations in 2020. This shows that the open source ecosystem is being further improved, enterprises are beginning to reserve open source talents, and the business logic of open source is becoming clear.
4.5 Remote Working
Telecommuting is a very important way of working today. Eighty percent of the participants consider telecommuting important, and nearly nine percent of the participants have telecommuting experience.
[Expert comments]
Gao Yang: Telecommuting will become a normal way of office work and collaboration into our work and life. The open source movement and distributed, remote collaboration are naturally integrated.
4.6 Open source products for the first time
32% of participants first contact with open source products for Internet products, followed by operating system-related products and development tools, the survey results are basically consistent with the past, indicating that Internet products and operating system-related products are still the initial window for people to understand and contact open source.
4.7 Most want to be open source products
Compared to 2019, development tools topped the list of products most wanted to open source this year, followed by operating systems, databases and middleware.
4.8 Enterprise contribution to Open Source
In the eyes of the participants, GitHub made the largest contribution to open source software, followed by Google, while Alibaba, Huawei and Baidu took the top three positions among domestic enterprises respectively.
4.9 Robot process automation
As to whether open source projects inherit RPA, the robotic process automation tool, 40% of participants indicated that some projects integrate RPA, 16% indicated that almost all projects integrate RPA, and 24% indicated that they never integrate RPA in open source projects.
5. Open source community participation status
5.1 What attracts you most about open source
Open and transparent code and knowledge sharing, as well as the spirit of open source, are the most attractive factors for participants, while the cost of software purchase is not the main factor.
5.2 What are your favorite open source products
Linux was the most popular open source product by a wide margin, followed by MySQL, Apache and container dark horse Docker.
5.3 Specific community work involved
The vast majority of participants are involved in code or documentation in the community, as are testing, localization, and event organizing.
5.4 Most promising open source products
In the outlook for open source products, Internet products ranked as the most promising development direction of open source products with 34% of participants. Artificial intelligence and development tools have emerged as the hot spot in open source products.
5.5 Communication methods in the Open Source community
Wechat and mailing lists are the most popular means of communication in the community. QQ is also an important instant communication tool in China. The rise of new communication tools: Slack and Zoom are also noteworthy.
[Expert comments]
Wu Sheng: The intensive use of wechat and QQ groups still shows that open source circles lack the consistency of international social networking and collaboration capabilities. It also reflects that Chinese is still the only way most open source participants like to communicate.
5.6 Relationship between community and code
Nearly 90% of participants believe that the value of the community in the open source community is greater than or equal to the value of the code, because a community built around the code makes the code better.
[Expert comments]
Gao Yang: I’m glad to see the recognition of community value. In our opinion, a healthy community is more important than good code. Only a healthy, diverse and friendly community can promote sustainable development of the project.
5.7 Age distribution of open Source community
Contributors in their 20s make up the majority of the open source community, accounting for more than 60%.
[Expert comments]
Wu Sheng: There are many young developers in open source participants. On the one hand, it reflects that China’s participation in open source is relatively late. As a group with high flexibility, students are more likely to follow the recent popularity of open source and choose to participate in open source. The lack of developers over the age of 35-40 positively reflects the lack of experienced engineers in China. Due to the massive rise of open source projects in China, the proportion of this segment will depend heavily.
5.8 Open Source Software Security
While nearly 70 percent of participants said they had no feelings of insecurity, 25 percent said they had such concerns, indicating that the security of open source software is still a concern.
[Expert comments]
Sweet Potato: Open source software because of its mechanism, through the open source community is constantly found and maintained, its security issues may not be the main concern of developers. In fact in addition to the technical aspects of safety, open-source compliance license respects such as there is also a need for developers to pay attention to security issues, legal risks brought by the license conflict influence is very big and hard to find, especially for companies, check the use of open source software compliance, its importance as the technical security.
5.9 The role of open Source foundations in China
Participants agreed that it is meaningful to have China open source foundation, can promote the concept of open source, open source culture education, community building, establishing open source ecosystem, to the open source project early hatching and support, to help China open source community resources integration and optimization, can also help in line with the international open source community.
[Expert comments]
Jiang Tao: In the context of global competition in science and technology and the development of China’s core open source technologies, it is of great significance to build and develop the China Open Source Foundation from the vantage point of the open source ecosystem. Through the development and expansion of the China Open Source Foundation, it encourages domestic giant technology enterprises and social forces to jointly build an open source innovation ecological environment. At the same time, by learning from the more perfect operation mode, organizational mechanism and legal system of foundations, we will build China’s open source innovation force for sustainable development. In addition, in the current global open-source business model is more and more mature market environment, the construction of open source foundation, also can guide funds and investment institutions at home and abroad incubator support open source, cultivating Chinese open source “unicorn”, eventually forming capital, companies, developers, as the main body, and share the HRSC open source commercial ecological system in China.
Gao Yang: 2020 will see the launch of China’s first open source foundation, Open Atom, which is of great significance for promoting the popularity of open source education and the overall prosperity of open source ecology in China. We look forward to seeing open Atom as a connector for open source, connecting the international open source community, linking quality resources at home and abroad, and helping open source projects grow and succeed.
6. Status quo of developer technology
6.1 Development Language
Development languages present more than more strong state, the top three are Java, JavaScript and Python, the ranking is basically the same as last year.
[Expert comments]
Mr. Wang: Programming languages are a popular topic, and there hasn’t been much change in the rankings. It is worth mentioning that SQL language, the usual sense of existence seems not very strong, but the practicality is very strong. In fact, the underlying data in the GitHub data section in this report is all run in SQL, an important foundational skill for developers.
6.2 Online document collaborative editing tool
Among the online document editing tools, graphite document is currently frequently used by the Open source community in China, and Wiki is still a collaborative tool used by many participants.
6.3 the editor
VS Code remains the most popular editor this year, followed by vim and notepad++.
6.4 Version Control Tool
There is no doubt that Git is unique and has absolute advantages. SVN, TFS, and CVS are still used by many participants.
6.5 database
In database usage, not surprisingly, MySQL leads by a wide margin, followed by MongoDB and PostgreSQL.
6.6 Operating System
In the use of operating systems, Windows, Linux, MacOS X three world.
6.6 Meeting Tools
Tencent conference is currently the most used meeting tool by participants, followed by Zoom and Dingding.
6.7 Chat O&M Tools
Currently, the majority of participants did not use chat operations tools to automate project management, and Hubot was the most used ChatOps tool among those who did.
6.8 Open Source Platforms
If they were to open source their projects, 87% of participants said they would open source their projects on GitHub, followed by Gitee and Gitlab.
[Expert comments]
Jiang Tao: With the rapid development of open source scale and commercialization, open source platform, as an important foundation and support system for open source projects and developer ecosystem, will carry more services and application scenarios in the future, and will also develop their own characteristics. CODE China, newly released in 2020, as an independent third-party open source platform, will focus on AIOT and provide operational support and ecological services for more open source projects and developers.
6.9 Technical Forum
Zhihu and CSDN are currently the two most popular technology forums among participants, followed by StackOverflow. Open Source China and Blog Park are also favored by many participants.
7. Summary & Acknowledgements
There were many difficulties in the dissemination and collection process of this questionnaire, which finally presented only a small corner of the open source world, but it is of great significance. It is not hard to see that China’s open source structure is changing, and the spark of open source is starting a prairie fire. Hopefully this report will lead to more community and developer participation.
The questionnaire questions and report documents of the report are published on the code hosting platform, the official website of the open source community and the website platform of partners. Shared under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) License. You are welcome to submit Patch on the code hosting platform to supplement and contribute to the report if you have any suggestions and ideas on the question design and report content of the questionnaire. One small step for you, one big step for the whole Chinese open source community.
Members of the open Source China Annual Report team who contributed to this report: Xia Xiaoya, Ning Zexin; Community partners: X-Lab, Gitee, Microsoft Reactor. Thanks to the invited experts in the field of open source, and especially to everyone who actively participated in our research.
GitHub Data
1, an overview of the
1.1 background
This part uses GitHub global event logs for statistical analysis (874 million in 2020), an increase of about 60% compared with 546 million in 2019. Some of the analyzed projects are manually annotated, and Chinese individual developers and enterprise organizations are selected. At the same time, a scientific and reasonable mathematical model is constructed for analysis.
Based on the definition of developer activity and project activity, the total number of active projects in 2020 is about 53.73 million, an increase of about 35.3% compared with 39.72 million in 2019. GitHub has more than 56 million total developers and 14.46 million active developers in 2020, up 21.2% from around 11.19 million in 2019.
1.2 Indicator Interpretation
The index name | meaning |
---|---|
language | The main language used for project development |
activity | Project activity (calculated by weighting) |
developer_count | The number of developers working on the project under the definition of activity |
issue_comment | Total number of comments for all issues and PR of the project in 2020 |
open_issue | Number of new issues of project in 2020 |
open_pull | Number of PR pull Requests (PR) to be added in 2020 |
pull_review_comment | The number of reviews under all PR for the project in 2020 |
merge_pull | The number of PR projects merged in 2020 |
pull_commits | The number of commits that the project has entered through PR |
pull_additions | The number of lines of code that the project adds through PR |
pull_deletions | The number of lines of code that the project removes through PR |
2. Main contents and findings
2.1 world Top10 open source projects
The most active projects are Google’s front-end cross-platform development framework, Flutter/Flutter, and Google’s deep learning framework, TensorFlow/TensorFlow, Container orchestration system Kubernetes/Kubernetes is also ranked 5th and 6th respectively, indicating that Google’s efforts and influence in open source have gained recognition in the industry.
Microsoft’s cross-platform code editor, Microsoft /vscode, and Microsoft’s project to use open source to build its Azure cloud platform, Microsoft docs/Azure -docs, came in second and third, respectively, indicating that Microsoft’s open source efforts have won programmers’ approval.
Microsoft/vscode and DefinitelyTyped/DefinitelyTyped using the TypeScript as the main programming language, this may be related to the popularity of TypeScript rise sharply.
2.2 China’s Top50 open source projects
In the Top50 list, in terms of project activity, the difference between the 1st and 50th place project activity is more than 10 times, indicating that there is a large gap between China’s open source projects.
From this list, we can see that Alibaba has a remarkable record in the open source field. There are four projects on the Top10 list, in addition to ant financial’s react-packaged component library ant-design/ant-design, In addition to ant-Design/Ant-Design-Pro, the scaffolding of the middle and back management console based on the Ant Design framework, and Alibaba/Nacos, a feature set dedicated to configuring and managing microservices, ElmFE/ Element, the open source project of Ele. me (which has been acquired by Alibaba), also stands out as a front-end component.
PingCAP is also doing very well in the open source space. Six projects are listed in the Top50, including pingCap/TiDB, an open source distributed relational database, TiKV/TiKV, a distributed transactional key-value database, pingcap/docs-cn, Pingcap /docs, etc. This shows that PingCAP takes project documentation seriously. The notable project is pingcap/tidb, whose issue_comment, open_issue, pull_review_comment, merge_pull and other attributes are the highest among the 50 projects. A staggering 52,871 and 10,981, compared to about 61.5% of the second-ranked Ant-Design/Ant-Design under the ISSUe_comment attribute. The number of developers participating in 480 is far higher than other front-end projects, but it has such a high level of activity, which shows how active the community is.
Baidu has done very well in the field of ARTIFICIAL intelligence. Its deep learning platform PaddlePaddle has occupied six projects, namely the core framework Paddle and related tool library, extended version and model library. Open autonomous driving platform ApolloAuto/ Apollo also made the list.
The list of China’s Top50 projects includes alibaba’s ant-design component library, jd.com’s development framework taro based on React front-end framework, and Vue UI component library Element open source by ele. me (acquired by alibaba) front-end team, etc., which shows that in China, The front-end community is more active in the open source community; Front-end code is also generally less confidential, so companies are more open-minded. There is one caveat: there are more front-end component libraries than core projects.
PaddlePaddle/Paddle, apache/incubator-tvm, Tencent/ NCNN and alibaba/MNN, all of which belong to the field of artificial intelligence, occupy a place in the list of Top50 projects in China. This shows that major Chinese companies are vigorously developing the FIELD of ARTIFICIAL intelligence.
2.3 Analysis of Chinese open source enterprises
Behind each major open source project, there is basically the support of science and technology companies. We calculated the active situation of open source projects maintained by science and technology companies in 2020, and the results are shown in the following table:
In the open source data of domestic enterprises, we can see that Alibaba ranks the first in all indicators except pull_review_COMMENT index. In terms of individual scores, Repo_count and Developer_count are the sum of other companies. The number of open_issues is also an order of magnitude higher than other companies.
AI is baidu’s most visible competitiveness in open source, such as PaddlePaddle, a domestic open source deep learning platform, and Apollo, an autonomous driving platform. Tencent released 192 REPO projects on GitHub, mainly focusing on five technical fields of cloud native, big data, AI, mobile development and Web development. Huawei’s investment in open source is well known, but the GitHub data fails to fully demonstrate its strength, which gives a glimpse of the impact of the macro trade war between China and the US. As an emerging Internet finance company, WeBank has opened 27 projects and 44 REPO, covering artificial intelligence, blockchain, cloud computing, big data and other fields. It is also a financial company that established an open source office in the early stage of the company. Didi has a strong momentum of open source. It has established an open source committee, not only actively participates in the industry’s open source projects, but also actively opens 38 REPOs on GitHub. Deepin’s desktop environment DDE is loved by users at home and abroad. In addition to its own development, software Ecology also includes thousands of desktop applications in its app store. As a Domestic folk Linux operating system, it is still commendable.
In the second half of 2020, PingCAP announced the completion of the $270 million Series D financing, creating a new milestone in the history of global database. Similarly, PingCAP’s performance in open source is also very bright today, and has surpassed Baidu to rank second in the ranking. Among them, the number of pull_review_comment exceeds that of Ali, but the number of developers is less than 1/10 of That of Ali. It can be seen that PingCAP has a very active open source community.
The likes ranking has risen very quickly, probably due to the excellent performance of its open source project Youzan/Vant, which is a lightweight mobile UI component.
It is worth noting that new social media companies such as Bilibili, Douban and Nuggets, which focus on User generated Content, are also actively using open source technology.
It can be seen that in recent years, China’s leading open source enterprises have continuously increased the investment and construction of open source community ecology. Enterprises in various fields such as Internet, operating system, social networking site, finance, cloud computing and e-commerce have actively participated in the open source bloom.
2.4 Apache Software Foundation’s open source project from China
Founded in 1999, the Apache Software Foundation (ASF) is dedicated to helping individuals and organizations understand how open source can take advantage in a competitive marketplace. The focus is not on producing software, but on guiding the community that produces it. The Apache Way has significant advantages for the sustainability of the open source community: everything we maintain is open source, and all users can benefit from it. Apache currently has 14 top projects originating in China and 7 incubator projects striving to become top projects.
In 2020, Apache software foundation had 21 active open source projects originating in China, among which 9 projects were listed in the Top50 list of open source projects in China.
Among the open source projects in China under the Apache Software Foundation, the most active project is Apache/ShardingSphere, which is an ecosystem of open source distributed database middleware solutions. It consists of three products — JDBC, Proxy, and Sidecar (in planning) — and became an Apache Foundation top-level project on April 16, 2020.
Apache/Incubation-Echarts and Apache/Skywalking also performed very well, ranking 10th and 12th respectively in the Open Source ranking in China. Apache ECharts is a free, powerful chart and visualization library; SkyWalking is an observation platform and APM tool that selectively works with Service Mesh to provide automated metrics for microservices, cloud native and container-based applications. It currently provides monitoring services for major Chinese companies such as Alibaba, Huawei and Tencent. ECharts and SkyWalking are also top projects of the Apach Software Foundation.
From these data, we can see that Apache, as one of the most active foundations in the world, contributes greatly to the Chinese open source community. On the one hand, Apache, as a top foundation, attracts more and more Chinese open source projects with its excellent open source projects and harmonious community atmosphere. On the other hand, more and more Chinese open source projects are participating in the Apache community with more and more Chinese colors. I hope that Chinese open source projects will soon enter the world’s Top10 open source projects!
[Expert comments]
Chen Yang: From the perspective of time, we can see that open source in China has gradually changed from an early follower and participant to an influencer and creator. The power of open source in China is rising. China has emerged a large number of open source project creators, Chinese enterprises open source project start hatching donated to the foundation, China began to define your own open source license agreement (mulan loose protocol) and accepted by the OSI, China began to set up China’s open source foundation foundation (open atoms), these are all very important symbol of China open source began to mature.
Liu Tiandong: In 2015, Open Source Society and ASF jointly held The Apache China road show. At that time, there were only three open source projects from China in ASF (Kylin, Eagle and Griffin of Ebay China Research Institute), but within six years, It grew to 21 (14 of which graduated from ASF incubators to become top programs). Hard work, sweet and sour drip in my heart. Based in China, contribution to the global vision is on the way!
2.5 Visit China’s top open source projects
Ma Yanjun (Senior Director, Baidu Deep Learning Technology Platform Department, PaddlePaddle/Paddle)
It’s great to see PaddlePaddle continue its highly active position in the China active sports list in 2020 from 2019.
Deep learning framework is the core of artificial intelligence open source ecosystem, with high technical complexity and continuous polishing with slow work combined with applications. As the earliest open source deep learning platform with the most complete functions in China, Feioar has been developing open source with the concept of openness and transparency. Combined with the development needs of THE AI industry, Feioya has maintained forward-looking overall design in the overall design of the framework and the whole process development tools, maintained the ultimate pursuit of engineering quality, and effectively guaranteed the quality through the community mechanism, and formed a good reputation among the majority of AI developers by virtue of project quality.
Paddles has always been very concerned with the contribution and identity of developers to the community. In addition to having more than 5,000 open source developers contribute through PR or issue, Feoar also encourages more developers to contribute through walkthroughs, community exchanges and other ways. All developers in the process of using problem will have fast response in the community, are also incorporated into the new version released, through this continuous closed-loop grinding, project quality guarantee and continued ascension, and better meet the requirements of the use of AI developers, also let the developers to form the stronger community identity.
Deep learning framework is in the position of connecting the previous and the next in THE AI technology stack. It is very important to connect with chips downward and accept applications upward, and widely adapt to chips and deeply integrate and optimize. Therefore, feoar has established close cooperation with hardware manufacturers. Many chip manufacturers directly contribute codes to the feoar community, making great contributions to the ecological development of feoar. Feioar has also been deeply cooperating with major open source organizations and AI communities, and has been supported by the OpenI community to become an important member. Promote in-depth communication and interaction with developers through PaddlePaddle Developer Experts (PPDE), SIG and other organizational forms, and grow together with community developers.
Wu Sheng (Apache/ Skywalking)
Apache SkyWalking has exploded this year with language probes covering all major programming languages, including Java,.netcore, Golang, PHP, NodeJS, Python, C++, and LUA for Nginx. Common metrics of community activity — number of stars, number of contributors, number of PR — have doubled since 2019.
SkyWalking’s user factories cover almost all the major factories in China and are developing their own standard system. SkyWalking transport protocol is fully supported by cloud APM services of major cloud vendors, alibaba Cloud and Tencent Cloud. At the same time, SkyWalking also achieved seamless integration with major monitoring ecosystems such as OpenTelemetry, Prometheus, and OpenCensus.
This year’s SkyWalking Community and Summer 2020 collaboration was very successful, spawning two online graduate students as committers. The performance of these two students demonstrated the potential of student groups in top projects. This is a new look compared to 2019. It also shows the value and significance of systematic student incubation program. Hopefully, we will see more and better student targeted incubation and collaboration programs in the future, including summer 2021, which has already been released.
In the process of globalization, SkyWalking projects are progressing smoothly in asynchronous and diversified collaboration patterns across regions and time zones under the framework of Apache vendor neutrality. The SkyWalking project has become a core component of commercialisation projects by several domestic and international companies in Asia, Europe and North America, and more professional developers have joined the project. It brings great activity and iteration speed to the project ecology. SkyWalking is rapidly maturing and growing at its own pace, along with commercial vendors, individual developers, and corporate secondary development teams both at home and abroad.
PingCAP CTO, PingCAP /tidb, tikv/tikv
The focus of TiDB community operation this year can be divided into: users, developers. From the perspective of business, it can be divided into project polishing and improvement, talent cultivation and ecological construction, user scene mining and successful promotion of business. So the most important thing about running a good open source community is to focus on people and grow with them.
For the TiDB community, the past year has been one of rapid growth, with the launch of the TiFlash column engine in version 4.0 last year. A solid step on the road to real-time HTAP. According to our data, close to 1/3 of 4.0 clusters are using TiFlash. This indicates that real-time data insight directly on TP data is a common requirement.
Among them, we can feel the open source atmosphere in China and the active degree of developers is gradually improving. Our contributor number has reached 1,200+ by the end of 2020 from 500+ contributor at the end of 2019, and people’s participation and understanding of open source projects are getting deeper and deeper. More and more developers are gathering in the TiDB community, connecting the upstream and downstream of the industry through TiDB and sharing innovation results. From the operational perspective of TiDB, the open source community thrives on a number of basic principles: transparency, openness, sharing, etc. For example: all our discussion documents, development directions, polls, elections are open and transparent, and all contributors to the community can participate; Secondly, we have also developed some basic community governance rules and structures. In terms of infrastructure, we have also done some automated Bot services to help more developers get better experience and feelings in the community. Finally, through some open and transparent incentives and feedback, attract more developers to actively participate.
3. Case Study — ASF
3.1 introduction
Founded in 1999, the Apache Software Foundation (ASF) is a 501(c)(3) non-profit public charity organization in the United States. The Foundation is committed to:
- Provide infrastructure: provide hardware, communication and project governance infrastructure for open source projects;
- Provide a legal entity for donations: establish a separate legal entity to which companies and individuals can donate resources and ensure that those resources will be used for the public good;
- Provide a legal framework for individual volunteers to avoid legal action against foundation programs;
- Provide Apache trademark protection: Protect the “Apache” trademark of its software products from abuse by other organizations.
The Apache Software Foundation’s mission is to provide software for the public good. The foundation helps individuals and organizations understand how open source can take advantage in a competitive marketplace. The focus is not on producing software, but on guiding the community that produces it. Known as the Apache Way, the meritocracies process has seen more than 800 individual members and 7,000 submitters successfully collaborate to develop free enterprise-level software that has benefited millions of users around the world. Apache is everyone’s Apache.
3.2 The Apache Way
The Apache Way is a set of behaviors or practices developed by the ASF to initially promote long-term successful projects by focusing on stable governance and encouraging new contributors. All Apache projects must follow the following basic principles:
- A healthy, diverse and inclusive community promotes the growth and sustainability of projects. Community over Code: ASF has always believed that good software is rebuilt by a strong community.
- Earned Authority: Everyone has the opportunity to participate, and their influence is based on publicly earned merit — the contribution they make to the community. Merit is personal, is not expired, is not affected by employment status or employer, and is not transferable.
- ASF’s flat structure: There is mutual respect in the Apache community, roles are equal, everyone’s vote carries equal weight, and contributors enjoy the same rights as others on a volunteer basis (even if an organization pays them for their work on Apache code).
- Most Apache mailing lists are archived and publicly accessible to ensure automatic collaboration, a requirement for a globally distributed community.
- Since complete consensus cannot be reached at all times, traditional and binding voting or other means of coordination may be needed to help remove barriers to decision-making.
- Responsible Oversight: The ASF governance model is based on trust and empowered oversight, with project autonomy and reporting directly to the board. Apache submitters help each other by evaluating submissions, implementing mandatory security measures, ensuring license compliance, and protecting the Apache brand and the entire community from harm.
3.3 Data Analysis
We calculated the activity of all 21 ASF project warehouses originating from China, and the data are as follows.
We made statistics on the working hours of all ASF project warehouses from China and drew a working time distribution map for each warehouse. Here, we select three projects with obvious characteristics of working time distribution for simple analysis.
- Apache/CarbonData, Apache CarbonData is a new converged storage solution that uses advanced column-based storage, indexing, compression, and encoding technologies to improve computational efficiency and thus speed up queries that are an order of magnitude faster than PetaBytes data.
- Apache/Incubator – Teaclave, an open source universal secure computing platform, makes computing privacy-sensitive data secure and simple.
- Apache/Hadoop-ozone, ozone is scalable, redundant, and distributed object storage for Hadoop.
Carbondata is clearly not in UTC+8 but in UTC+4 and 5 in all three projects, because its main maintainer, Ravindra Pesala, is Indian and time zone compatible; Teaclave operates entirely on U.S. time, and China’s daytime is mostly silent. Mingshen Sun, the founder of Teaclave, is based on the West Coast of the United States, not in China, despite being a baidu donation. By contrast, Hadoop-Ozone is clearly globalization’s best project.
4. Case Study — CNCF
4.1 introduction
CNCF is the Cloud Native Computing Foundation and is part of the Linux Foundation. CNCF hosts key components of the global technology infrastructure, bringing together the world’s top developers, end users and suppliers.
4.2 Proposal Process
CNCF has developed governance policies for the project proposal process, which applies to existing projects to join CNCF and new projects to be formed within CNCF.
- Sandbox: All exceptions (including rejections) are handled by the TOC (Technical Oversight Committee, which provides Technical leadership to the cloud native community). When a project is rejected, it may be “not currently suitable” and the project may be encouraged to reapply once the problem is resolved. The time required for the whole process is not fixed. There are currently 44 projects in this phase, Artifact Hub, Backstage, etc.
- Incubating: Same as a sandbox process, all exceptions, including rejections, are handled by the TOC. Projects in this phase include Argo, Buildpacks, CloudEvents, CNI, Contour, Cortex, Cri-O, Dragonfly, Falco, gRPC, KubeEdge, Linkerd, NATS, Notary, and Open Policy Agent, OpenTracing, Operator Framework, SPIFFE, SPIRE, Thanos, etc.
- Graduation: Consists of three steps: submitting a graduation proposal template, TOC members beginning a two-week public comment session on the TOC mailing list, and TOC voting. Projects currently in this phase include Containerd, CoreDNS, Envoy, ETCD, Fluentd, Harbor, Helm, Jaeger, Kubernetes, Prometheus, Rook, TiKV, TUF, Vitess, and others.
CNCF hosting has graduated, is incubating and sandboxed projects. Although CNCF provides a shared set of services for all projects, it does not provide substantial marketing services for sandbox projects because they are early stage projects that need to be lightweight and neutral for the project to grow naturally. Sandbox projects have a lower priority in project service than their incubated and graduated project counterparts.
4.3 Data Analysis
We have calculated the activities of all CNCF project warehouses that have graduated and are incubating, and the data are as follows.
We have made statistics on the working time of CNCF project warehouse, and the chart drawn for each warehouse is as follows.
- Kubernetes/Kubernetes, Kubernetes is an open source system for automatically deploying, extending, and managing containerized applications. It groups the containers that make up the application into logical units for easy administration and service discovery.
- Thanos – IO/Thanos, thanos is a set of components that make up a highly available Prometheus setup with long-term storage. The main objective was to simplify operations and preserve the reliability of Prometheus.
- kubeedge/kubeedge
The time distributions of the developers on the three projects above have obvious time zone aggregation. Kubernetes’ developers are mostly based in the AMERICAS around UTC 5, while Thanos’ developers are mostly European and Kubeedge’s developers are mostly from Asia Pacific. In the distribution of kubeedgede’s working hours, we can see that developers have a habit of taking a nap, and around 4-5 UTC, 12-13 local time, developers’ workload drops sharply.
5. Case Analysis — LF AI & Data
5.1 introduction
LF AI & Data is a comprehensive Foundation under the Linux Foundation that supports open source innovation in artificial intelligence, machine learning, deep learning and Data. LF AI & Data was created to support open source AI, machine learning, deep learning and Data, and to create a sustainable open source AI ecosystem that makes it easy to create AI and Data products and services using open source technologies. In addition to some supportive services, it includes membership and funding management, ecosystem development, legal support, PR/marketing/communications, event support and compliance scanning, as well as support for open development projects in diverse and thriving communities.
At present, the projects that have graduated from LF AI & Data Foundation include Acumos, Angel-ML, Egeria, Horovod and ONNX.
Incubated projects include Adlik, Adversarial Robustness Toolkit, AI Explainability 360 Toolkit, and AI Fairness 360 Toolkit, Amundsen, DataPractices, DELTA, Elastic Deep Learning (EDL), Feast, ForestFlow, JanusGraph, Ludwig, Marquez, Milvus, NNStreamer, OpenDS4All, Pyro, SOAJS, Sparklyr, etc.
5.2 Proposal Process
Admission to the LF AI & Data Foundation requires a proposal. The project proposal process is the same for existing projects that attempt to enter the LF AI & Data Foundation as well as for new projects that will be formed within the LF AI & Data Foundation.
Projects must be submitted via GitHub and notified LF AI & Data members by sending an email with the subject line “PROPOSAL [Project name]” to [email protected].
After the project proposal is submitted, the approval process is divided into four steps.
- Step 1:
- The project prepares a PROPOSAL based on the provided template and submits it via GitHub with a short email to [email protected] with the subject line “PROPOSAL [Project name]”.
- Projects represented by the company must sign and submit a copy of the trademark and account transfer agreement; The goal of this agreement is to transfer ownership and management of the project trademarks to the Linux Foundation’s project account.
- The LF AI & Data members will inform the project whether the proposal is ready for submission to the TAC and will work with the project to complete the proposal submission, review and vote.
- Step 2:
- The project will be presented to the TAC by teleconference at its fortnightly meetings.
- The speaker will have 45 minutes to present relevant information.
- Presentation documents must be sent to LF AI & Data via [email protected] at least 3 business days prior to the scheduled conference call to ensure adequate review time for TAC members.
- TAC will have 15 minutes to discuss with the project representatives before voting to determine the acceptance of the project and the stage at which the project will be hosted in LF AI & Data.
- Step 3: If the project is approved by TAC at graduation level.
- Forward the proposal to GB for approval.
- Project representatives will be invited to the next GB teleconference to give a short presentation (10 minutes) and then GB will vote to accept graduate-level projects.
- Step 4: Announce and join
- Announce LF AI & Data project hosting announcements via blog posts or press releases
- Project entry – After a new project is approved, LF AI & Data members will help the project join the Foundation and begin providing needed support in the focus areas. LF AI & Data members will provide details and timing of each event for the project.
5.3 Data Analysis
We have calculated the activity of all LF AI & Data’s project warehouses, and the Data are as follows:
We calculated the working hours of the LF AI & Data project warehouse and drew a chart for each warehouse. Here, we select three items for analysis.
- Milvus-io/MILvUS, MilvUS is designed for approximate Nearest Neighbor search (ANNS) of massive eigenvectors. Compared to operator libraries like Faiss and SPTAG, Milvus provides a complete framework for updating vector data, indexing and querying. Milvus uses GPU (Nvidia) to accelerate indexing and query, which can greatly improve the stand-alone performance.
- odpi/egeria
- Nnstreamer/NNStreamer, nNStreamer is a set of Gstreamer plug-ins that provide convenient and efficient support for Gstreamer developers who adopt neural network models and neural network developers who manage neural network pipelines and their filters.
It can be found that the distribution of working hours of the developers of the three projects is different. Milvus-io/Milvus developers are active from Monday to Saturday, working mainly in UTC+8 time zone, indicating that the majority of developers on the project are Asian. Odpi/EGERia developers are mostly in UTC+0, that is, mostly European developers, but at 0 there are a lot of events happening on weekends and weekdays, and it is almost certain that the project uses automated collaborative robots, and many of the scheduled tasks are performed at 0; Nnstreamer/NNStreamer developers work mainly from Monday to Friday in UTC+8 time zone, indicating that the project is dominated by Asian developers, and developers are off on weekends.
6. Case study — Wuhan2020
6.1 introduction
Wuhan2020 is a representative of large-scale self-organization during the COVID-19 pandemic, as well as one of the typical technology-based self-organization communities. Less than three months after its establishment, Wuhan2020 has carried out voluntary services and cooperation on the Internet in an open source way. Online volunteers collaborated to create five pages/websites, 23 code bases on GitHub, and 4,394 items of information, both primary and secondary, related to the fight against COVID-19.
According to incomplete statistics from self-organizing organizations, as of May 20, 2020, Wuhan2020’s five pages/websites have received more than 300,000 hits. The data information set that is allowed to be read and used openly facilitates the data docking of hospitals and communities in Wuhan community and neighboring cities provided by Oxbridge Alumni Assistance, and the medical aid of MEDICAL materials provided by THE Silicon Valley team and a GLOBAL medical information service platform in the United States. So far, Wuhan2020 has become one of the few self-organizing representatives to declare normalization and complete the construction of permanent organizational structure in the fight against COVID-19 so far.
6.2 Member Analysis
As of May 20, 2020, the number of participants confirmed by the open source community of Wuhan2020 is 4,095 according to the mailbox statistics of participants. Subsequently, volunteer certificates were issued through the mailbox of volunteers, and 1,942 people received the Wuhan2020 volunteer certificates.
[All types of mailboxes in the community, total 4,095]
[Types of mailbox that have received volunteer Certificate, total 1,942]
You can roughly estimate the occupation and age of community members by type of email, combined with other data. For example, the educational mailbox is used by teachers participating in Wuhan2020, and most QQ mailbox users are born after 1995.
In addition, 16 ali mailboxes, 5 Fluent mailboxes, 4 jingdong mailboxes, graphite, Pricewaterhousecoopers and other mailboxes are found to be used by enterprise personnel.
In the early stage of Wuhan2020 project, a total of 1,606 pieces of volunteer occupation and educational background information were collected. According to the collected volunteer information, members were analyzed with these volunteers as samples.
))
[Occupational Analysis of community members]
The difference between the two graphs above is the occupation order on the horizontal axis, which is designed to distinguish community members with multiple occupational identities. The number of members of multiple identities is obtained by subtracting one of the occupational categories on the left from one of the occupational categories on the right.
[Education Breakdown of current students, total 853]
Since more people fill in the occupation as students, only the specific educational background except students can be analyzed, which is only a sample and not necessarily representative.
It can be further broken down by the degree of community members, including those who have graduated.
[Degree of community member, total 1,606]
It can be seen from the figure above that most of the community members have bachelor’s degree. Combining the occupational information and email information of previous community members, it can be inferred that most of the participants in the Wuhan2020 open source community are students in the age range of 95-05. A large number of Gmail users are from overseas, and the proportion of these members receiving volunteer certificates is relatively small.
[Pie chart of community members’ expertise in fields, total 1,606]
It can be seen from the analysis results of the fields in which community members are good at that when they choose the fields they are good at, they tend to choose the fields of publicity or design, which also explains why there is a serious surplus of members in the publicity group of Wuhan2020, and many members of the publicity group still stay in the community after the event. This is also related to the fact that most data sources come from wechat rather than Slack. GitHub and Slack are more programmers or engineers who are directly involved in projects rather than filling out forms.
7. Summary & Acknowledgements
All the data, analysis methods and analysis results of this part are supported by x-lab. The team members who contributed to the content writing include wang haoyue and zhu xiangning. The part of “case analysis — Wuhan2020” was written by li Yang, CEO of Wuhan2020 open source community. Thanks to the invited experts in the field of open source, and especially to everyone who actively participated in our research.
Statement:
Disclaimer: This report is from Open source, and has been reprinted with authorization from open Source. Due to limited space, some excerpts have been selected. Please download the full report in PDF.