Author: Sun Jincheng
Abstract: This paper analyzes the problems in the application of Flink production environment. Flink can not write MySQL in real time is one of the common problems for beginners. It is proposed by luo Pengcheng, a student in the community. It is mainly divided into the following four parts:
- Problem description
- solution
- The reason analysis
- The lines
For more communication and feedback on production environment issues, please subscribe to Flink Chinese mailing list ~
Problem description
Flink 1.10 uses the Flink-JDBC connector to interact with MySQL. The data can be read and written, but the data can only be found in MySQL after the Flink program is completed. That is, although it is a stream calculation, it cannot output the calculation result in real time?
Related code snippet:
JDBCAppendTableSink.builder()
.setDrivername("com.mysql.jdbc.Driver")
.setDBUrl("jdbc:mysql://localhost/flink")
.setUsername("root")
.setPassword("123456")
.setParameterTypes(
BasicTypeInfo.INT_TYPE_INFO,
BasicTypeInfo.STRING_TYPE_INFO)
.setQuery("insert into batch_size values(? ,?) ")
.build()
Copy the code
How to solve it?
Flink 1.10 is a Case of knowing one second and not knowing how to work. It is very easy to encounter in the beginning of learning, so it is true that Flink cannot write MySQL in real time. Of course not, simply adding a line to the code base above solves the problem:
. .setbatchSize (1) // The buffer size that will be written to MySQL is 1. .Copy the code
The reason analysis
So while the problem is solved, what is the root cause? If you look at this and say, the problem is obvious, Flink designed JDBC Sink with default values for writing buffers for performance reasons.
DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE DEFAULT_FLUSH_MAX_SIZE Therefore, when you learn the test, due to the small amount of test data (less than 5000), the data is kept in the buffer until the data source is finished and the job is finished. The calculation result is not flushed into MySQL, so there is no real-time writing (each) to MySQL. As follows:
The default value of DEFAULT_FLUSH_INTERVAL_MILLS is 0. This means that there is no time limit until the buffer is full or the job is done.
That is, some beginners find a problem, even if the debug deliberately set a breakpoint, do not let the job end, but when the flowers are dead, data is not written to MySQL.
AbstractJDBCOutputFormat has two implementation classes in Flink 1.10:
Corresponding to the following two types of sinks:
So in Flink 1.10 both AppendTableSink and UpsertTableSink have the same problem. However, while UpsertTableSink allows users to set the time, AppendTableSink has no entry to set the time.
So, Flink’s pot?
As far as this issue is concerned, I personally think it is not the user’s problem, but the Flink 1.10 code design that has room for further improvement. In Flink 1.11 the community did refactor and Deprecated JDBCOutputFormat @deprecated. You can refer to Flink-17537 for the process of change. However, the default values of DEFAULT_FLUSH_MAX_SIZE and DEFAULT_FLUSH_INTERVAL_MILLS are not changed in this improvement. The community is also actively discussing the improvement scheme. Those who would like to contribute to the community or see the final results of the discussion can check out Flink-16497.
The lines
Of course, when you use any Sink during the learning process, as long as there is no real-time writing, you can check whether there is a limit set for writing buffer and writing time. Elasticsearch has a similar problem and needs to be set by calling setBulkFlushMaxActions.
Flink Chinese mailing list can be used for feedback of problems encountered in the process of learning and using Flink, and Flink core developers and front-line users in the community can answer questions and communicate online!
2 minutes quick subscription to Flink Chinese mailing list
Apache Flink Chinese mailing list subscription process:
- Send any email to [email protected]
- Received an official confirmation email
- To subscribe, reply this email confirm
If you subscribe, you will receive Flink’s official Chinese mailing list. You can send your questions to [email protected] or help others answer their questions. Try it!
The above are the solutions and ideas to this problem, I hope you can help, and I look forward to your typical problems can be timely feedback to the community mailing list.