Preface: this year also want to continue refueling, the road is very long, spend the night for a long time, wish the lights will guide the direction for you
You can see THAT I did try to play a few games this year, read the source code and hope to get some valuable PR. But unfortunately there is no good result, although there is accumulation, but far less than my requirements for their own
Originally, I thought, hoping to do the following:
- To improve my understanding of Leveldb by supporting read/write separation of hot and cold storage in RocketMQ consumer multithreaded offset writing, and refactoring some of the ideas into my own solution
- Roman in Flink supports generalized incremental checkpoint monitoring to understand the process of monitoring changlog delay indicators from checkpoint to backend. In addition, you can add related monitoring
- Through the etCD election of pulSAR community metadata module, the whole workflow of pulSAR from client to broker, to Ledeger storage, to bookie storage is clear
What’s holding me back as to why I don’t have time to follow up:
- The company needs to support the generic protobuf-format, but it does not need to be consistent with the community, due to the lack of familiarity with the Protobuf CodeGen and the bridging concept to Flink format, resulting in long delays in tracking source code
- The company required the old task of migration, because the schema was originally not managed by the data synchronization platform, but required not to lose data, and required to support Kafka2HDFS in the case that it should not be moved and the upstream and downstream table schemas could not be obtained. For this reason, the GenericMemoryCatalog supported HiveTableSink
- Familiar with the company’s new platform and old business oncall
It can be seen from the above that this year is mainly about flink, so I have no time and energy to do some tasks in the community. In addition, I suffered from a serious lack of knowledge at the bottom and spent a lot of time to make up for it. I was not familiar with the environment configuration of the open source project and wasted my weekends to step on the pit
Overall, no matter what I achieved this year, I still feel discouraged
Something still in progress:
- Flink Hackthon 2021
Github.com/flink-china…
Currently, I plan to support:
In flink SQL, checkpoint produces snapshot consistency when two source tables (one batch, one stream) are delayed. The preparation for the conformance commit of epoch-COMMIT support for materialized views needs to be investigated: 1. Simulation data source (mysql, pulsar) local DockerClient directly connected to dockerHub. If kafka is not a DockerHub, run a join program if kafka is not a DockerHub. 2. Implement a Detected agent layer at the Schema deserialization layer by referring to Blink PlannerContext where different format types (specified when writing SQL) -1 are required. Since the ScanRuntimeProvider is at the Runtime layer, Each connector in the Flink-connector folder is a RuntimeProvider that inherits from StreamRunTimeProvider to prevent users from using batch streaming converged data sources You need to re-implement a connector that wraps runtimeProviders for batch or stream data sources in Wrapper design mode and is compatible with different data source types 1. Improving Unit Testing && Integration Testing For unit testing: - 1. Current Lack The global watermark should be recorded at different points in time, such as the current current table delay of two hours, For example, the current Unit test of WatermarkAlignSupport actually only runs through the expected arrival time interval between local time and watermark time, which can be measured by referring to Flink's unit test of docker mirror local startup mini-cluster Batch source and stream source epoch-2 when they arrive. Create a materialized view of a batch stream JOIN for integration testing - 1. At the execution level of SQL 1. Join a mysql table join a PULSAR table Batch stream join The data at both ends is not delayed 2. Join a pulSAR table; join a pulSAR table; Join a pulSAR table batch stream join the pulSAR end data without delay - 2. E2e end-to-end testing can be performed on the Azure pipeline provided by Microsoft according to flink contributed documents: Flink doc:https://cwiki.apache.org/confluence/display/FLINK/Azure+Pipelines my assembly line: https://dev.azure.com/1817802738/_git/Flink 2. Kafka/pulsar show that at present in the local has pass, through the communication and @ first d 1. We need to set up a Docker environment to deploy pulsar standalone and Flink standalone respectively for remote debugging while working locally. Application -> local on docker ps: Flink and Pulsar can create a dockerhub locally Local test 2 based on the DockerClient provided by the TestContainers project. Introduce a new UnifiedTableSource at the Runtime layer for scanning peer data row 3. At the format layer, the data generated by the intermediate result of the Join, Based on flink1.13 document provide brand-new Changlog ex: in the Format of TestChangelogDecodingFormatFactory introduced flink - Format module 4. Write a Flink SQL to ensure that the SQL can be transferred to datastream. Since it is to participate in the competition, it is necessary to explain to everyone what the value of doing this thing is. Therefore, the best way is to complete a display of SQL streaming platform. Try flink - based SQL flow platform (community there are some who are using https://github.com/zhp8341/flink-streaming-platform-web in his project to introduce us the above packed source jar package / / TODO Incremental synchronization component support dezuieam, Flink-CDC-connector //TODO research epoch-COMMIT supports consistent commit of materialized viewsCopy the code
- Pulsar consumer side partition consumption rate support
Github.com/apache/puls…
I am a person who does not like to write long, because the paper will sleep shallow, realize this matter to practice. What you say and do, after all, has to be proved by practice. So that leads to three very simple questions
-
Have a
- Why are you writing code
I think everyone has something they feel happy to touch since they were born. For example, it is the first time to hear cicadas chirping, the first time to touch the keys of the piano with fingers, the first time to see beautiful cartoon pictures, the first time to see beautiful animation. Code is no exception to me. The summer I failed in the college entrance examination that year, the night my fingers typed hello world for the first time. For the first time, I felt joy in my heart. And I was thinking, this is probably the happiest thing I’ve ever felt in my life. Finally something that I find exhilarating even if it takes days and nights to get the result of tramping.
-
What motivates you to keep writing code
-
What do you find rewarding about writing code
-
WenShi
- How far do you intend to go
-
Ask yourself
Continue… QwQ