Note a Flink write file system timeout protocol negotiation

Cosn Tencent cloud package SDK

Project Architecture

Data acquisition and distribution system upstream analysis of mysql data, collection to the relay Kafka from the secondary Kafka distribution to the downstream Kafka as well as cosN distributor and library procedures are flink tasksCopy the code

If the timeout exception is resolved

Write cosn architecture diagram: www.processon.com/view/link/6…

incident

If cosn blocks fail to upload, an exception will be thrown in TaskManager until the connection hold time expires

What are the underlying principles of StreamingFileSink

Description: 1.12 doc flink: ci.apache.org/projects/fl…

Flink streaming writes to the file system in two ways: BucketingSink and StreamingFileSink.

Iteration history: StreamingFileSink was introduced after BucketingSink. The main difference is that StreamingFileSink can be used for fault recovery, ensuring exactly-once, but requiring Hadoop version 2.7 or higher, because HDFS uses the TRUNCate method.

We write the bucket of cosn at the hour level

Data writing process

1. Use FSDataOutputStream to write data to the buffer and store it in memory 2. At each checkpoint, data in the cache is flushed and files ending pending (files that have been completed but have not been logged by checkpoint, which can be logged by sink,setPedningSuffix) are recorded 3. Every 60 s detection can be set by InactiveBucketCheckInterval to 1. If a file FSDataOutputStream in 60 seconds (can sink. SetInactiveBucketThreshold (60 * 1000 l); If the bucket does not receive any data, it is considered inactive and then flushed and the file is closed. - 2. Registration for the current time in processingTimeServie + inactiveBucketCheckInterval don't write (60 seconds), - 3. The onProcessingTime method is used to determine whether 60 seconds are met without writing 4.. Flink encapsulates a Map<String, BucketState<T>> bucketStates = new HashMap<>(); BucketState is used to record the file that is currently in use. Key is the path to the file, and BucketState internally encapsulates all information about the file: - Creation time, - last write time (write refers to write time to cache, not flush time) - file size - current file open or closed - write buffer method 5. Every time Flink is going to do something to a file, it's going to get the wrapper object from here and when it cancels, the file that's currently being operated on is flushed and closed. Then change the file's suffix from in-progress to pending. The prefix and suffix can be set, but if there is no special need, the default is ok. So here we have the file, the bucketStates map in step 4. It's in the close method, and it's going to iterate over the map to do that.Copy the code

Tencent Cloud COSN team suggests that we use the latest version to configure an experience value to write cosN longer waiting time

However, since we did the back end before, should we set a reasonable slow call critical RT by referring to Sentinel, so that we can estimate the traffic peak and configure RT?

The discussion is as follows:

Cosn team feedback:

Cos SDK encapsulates HTTP interfaces, and Hadoop COS (COSN) encapsulates COS APIS to provide Hadoop file system interfaces. Hadoopcos retries the HTTP interface layer against the upper HTTP return code. Cos SDK will internally retry all HTTPClient IO exceptions (network exceptions). Symptom: If a COSN block upload (an API call) fails, an exception will be thrown until the number of retries is exhausted (exponential retreat retry). Under the normal condition of the back end: signaling class requests can be guaranteed within the second level, and the data flow part of the request is related to network quality. Conclusion: This RT is nothing more than a retry policy problem, in the case of minute network fluctuation as well as retry effect, and data flow TCP already has congestion control slow start measures. Of course, if the network is unstable for a long time then retry and RT are not fundamental solutions to the problem and the effect should be the same in the short term. Since both uploads and downloads are streaming, it may respond quickly but transmit slowly, feeling that RT may not be suitable for streaming retries.Copy the code

1. A block file corresponds to a thread, and the suffix of the block file ends with the thread name. 2. The next time the thread starts executing, look for the file with the corresponding suffix. 1. It may involve checkpoint thread switching at checkpoint. So there is a transfer delay caused by switching between task threads and checkpoint threadsCopy the code

Cosn team

Response refers to the establishment of a connection or the first transmission of data, and transmission refers to the transmission of data after the establishment of a connection (divided into multiple send). As for how many threads are used by the upper layer to transmit according to what strategy is the upper gameplay, of course, the purpose is to maximize the use of resources to increase throughput.Copy the code

Continue… QwQ

Note a Flink write file system timeout protocol negotiation

Related Posts

Why does Redis use jumpers instead of balancing trees?

Interviewer: Buddy, talk about the chain of responsibility model

Memory overflow daily