First of all, I would like to briefly introduce myself. I am Li Benchao, an engineer of Streaming computing in Bytedance Infrastructure, mainly in charge of Flink SQL. I’ve recently been honored to be invited as an Apache Flink Committer.
I started to participate in the community in the second half of 19th year. At the beginning, I mainly reported some bugs encountered in the process of using, and I would try my best to fix them. At the same time, I have been following the user and Dev mailing lists to keep abreast of the latest developments and future directions in the community; And learning from other people’s questions and answers. Later, with the deepening of my understanding, I began to help answer users’ questions, participate in design discussions, and discuss interested issues.
All forms of participation and contributions to the community will be recorded and recognized, such as contributing code, contributing documents (including translation), participating in various forms of discussion, helping to answer users’ questions, etc. From my personal point of view, these aspects have done some degree of participation, the most prominent point is mainly active in the user list.
This article mainly introduces my own participation in the community process and some experience, mainly from the following aspects:
- Getting to know the Flink community
- How to get involved in the community
- Gains in the community
- Contribution to the community
- Participate in community suggestions
Getting to know the Flink community
I first came into contact with Flink quite early. When I graduated from graduate school in 2017, my mentor set me the direction of streaming computing, specifically Flink. At that time, I was completely ignorant of Flink and my work. After reading the Flink document for a few days, I came to a very superficial conclusion that Spark Streaming should be enough for our scene (since I worked on Spark in the lab before, In addition, I used Spark Streaming in depth during my internship. This shallow conclusion delayed my deep contact with Flink by two years.
The second time I had contact with Flink was in a Flink Meetup in the summer of 2018, and I still have a deep impression on the scene at that time. In particular, Dasha’s speech at that time had a great impact on me. Dasha’s explanation of Flink in a simple way gave me the feeling that the Flink community was full of great talents, and Flink itself was also very interesting.
At that time, I thought how happy I would be if I was lucky enough to work with these people in the community. It is worth mentioning that Teacher Guanghui also shared the application of Flink in Bytedance at that time, and I am also a member of his team now. After that, we also did some exploration of Flink application application in the company (previous company). Generally speaking, Flink still satisfied our scene very well. However, due to the characteristics of the company’s data, we did not encounter too many challenges under heavy traffic, and only did some simple work in the use layer.
The third time I encountered Flink was in the summer of 2019, when I had just switched jobs to ByteDance, just as ByteDance was preparing to aggressively use and promote Flink SQL. My initial direction was not Flink SQL, but more of the Runtime direction. But later, due to the work of SQL direction is more and time is tight, and I have done OLAP direction before, AND I have some basic knowledge of SQL, so I changed to this direction. Our model was Blink Planner, which Ali had just opened source and merged into the Master branch.
Integration: First users of Blink Planner
I was lucky enough to be in the early stages of our company using Flink SQL; It was also the first release of Blink Planner for the community. Although Blink has many excellent features, the community has a very strict feature introduction mechanism, so in the first version of blink Planner, there are some features in blink on Aliyun. There is no blink Planner for the community. (In fact, I understand this phenomenon is better. Although it will slow down the introduction of some features and may not even be merged into the community eventually, it can ensure that the version released by the community is strictly designed and discussed, which is friendly for later maintenance and evolution.)
We also referred to many blink implementations and completed a lot of functions, such as CREATE TABLE/VIEW, CREATE FUNCTION, computed columns, WATERMARK, etc. In the process of our implementation, we began to ask some questions and feature request in the community. Most of these features have already been implemented in the community in version 1.10.
A typical example is the calculation column, which is an important feature for us. At that time, I realized an internal version by directly referring to the logic of blink branch. Then this also raised the demand to the community, and got a relatively quick response. In the process of interaction, we have a relatively good understanding of the implementation of this piece, and then gradually participated in some related work, such as fixing bugs related to some calculation columns.
Bytedance is a great platform with a lot of scenes and a lot of traffic. In our practice, we encountered many challenges, both functional and performance. We will actively explore and solve many problems. If we encounter some bugs, we will timely feedback to the community and help to fix them. When faced with a problem that I can’t understand, I will ask questions and ask for help in the mailing list or JIRA, and the average community partner can respond very quickly (usually within a few hours).
In the beginning we are also in a user’s identity in the community for help and experience, met cannot solve the problem, to the community, in general can help us to solve soon, in this way, we also make Flink SQL in bytes in the internal fast online and fall to the ground, and we have some internal find and solve problems, Give back to the community.
Gradually, we will find that we have already met and solved the problems mentioned by our friends from other companies in the community, and we will actively help other users to answer their questions. Before you know it, we’ve moved from being a user to a contributor.
This year, I harvest
It takes a lot of time and energy to get involved in the community. But relative to what you get out of the community, it’s worth it.
First of all, to participate in the community, the biggest advantage is that, can be in tune with the community, to be able to see the progress of the current community, and the planning of the future, not in some internal function design and feature planning, make some outdated design, so that with the community to ensure that we always in the same pace, And enjoy the latest features.
Secondly, the second benefit of community participation is that it greatly expands our scene and vision. The internal scene, however rich, is limited. However, in the community, we can communicate with all Flink users around the world about their experience in using Flink, and also get some ideas and inspiration from other friends. In this way, when solving many internal problems, we can also have a broader mind and a faster speed.
However, the third benefit of community participation is that we can find important bugs in time so that we can fix them before we have internal online problems. A typical example of this is that the COUNT DISTINCT state in a Window has a minor bug that it will not be cleaned automatically. This problem was completely raised by the community partners, and it was mentioned more than once. After I noticed this problem, I quickly did a verification and repair, and found that there was really such a problem. At the time, I thought, good thing I caught it in time and avoided a potential stability problem.
Finally, there is the hidden benefit of expanding the influence of companies and individuals. In the process of participating in the community, I also got to know many warm-hearted friends. I’ve even received several resumes from people in the community. ðŸ¤
Contribution to the community
As one of the early mass production teams using the community Blink Planner, our biggest contribution to the community so far has been fixing a lot of bugs that weren’t easy to spot, and making a lot of improvements. Given our extensive use of the Blink Planner production environment, We also voted +1 to make Blink Planner the default planner in 1.11.
Some of the more interesting questions include, but are not limited to:
- Flink-15430 & Flink-16589: Code generation problem over 64KB
- Flink-15428: CEP has stateful problems with concurrency greater than 1
- Flink-16181: CASE WHEN code generates NPE problems
- Flink-14546: Supports JSON MAP processing
- Flink-15494: Solve the problem of time column calculation error when using window cascade
- Flink-17942: WindowOperator automatically clears the COUNT DISTINCT state
- Flink-16068: Resolved that computed columns would have a problem with keywords
- Flink-17025: New AVRO Format is supported
In addition, we are also actively planning some future contributing features, such as:
- Flink-18202: FLINK supports ProtoBuf Format
- Flink-18379: FLINK supports asynchronous UDFS and UDTFs
- Flink-17137: Windows supports mini Batch
I just made some simple summaries from our SQL direction. In fact, our friends in the state/ Runtime direction also actively participate in the community and make a lot of contributions
Some tips from me
Flink community is very friendly to the participation of new partners. I have some such experience. After we have participated in the community for a period of time, the community still expects to allocate enough for some simple issues to partners who have just started to participate in the community. So if you want to participate in the community, you can actively participate in the community is very welcome to everyone ~
First of all, getting involved in the community is an ongoing thing, not a whim. Go see the state of the community and get involved. So getting involved in the community requires some patience and endurance, and a more timely understanding of what’s happening in the community. It may feel like the community is growing too fast at first, and checking the dev/user mailing list of the community every day can be time-consuming, but after a while it turns out that it doesn’t take much time, and sometimes it’s enjoyable.
Secondly, to participate in the community to be more courageous, not too timid. When there are questions that can be answered, actively participate in the answer and discussion. Community is originally a platform for communication and sharing, not just a process of asking questions and answering. In addition to the user list, you can give your opinion on familiar development and design, and every honest thought is valuable to the community. Might some friend will worry about English, people think this can actually don’t need to worry too much, in the community is practical English communication is given priority to, as long as the meaning expression is in place, without the rhetorical topdressing from how, also do not need every word to carefully consider all sorts of voice, tense problem (certainly not encouraged to write some sentences grammar have a problem).
Then there is development related. If you have a specific bug or feature, you can directly create an issue. It will not take long for someone to notice your issue and discuss and confirm the problem with you. Once an agreement is reached, you can start writing code and proposing PR. Of course, in addition to raising issues by yourself, you can also pay attention to the issues raised by other partners. For the issues you are interested in, you should click a Watch in time, so that you can keep abreast of the subsequent discussion and progress of the issue.
In terms of code, the community is very demanding. Therefore, when you write code in the community, you need to pay attention to all kinds of details, from a space or a blank line to the design pattern and architecture of the code, and the community partners will review it in detail. This process is also a very training process. In the process of gradual participation, you will find that your code level is constantly improving
Finally, I wish all interested partners in the community can be happy to contribute ~