- Internal – The checkpoint mechanism is used to save the status to a disk. When a fault occurs, the data can be recovered to ensure the internal status consistency.
- Source-kafka consumer, as the source, can save the offset. If a failure occurs in the subsequent task, the offset can be reset by the connector during recovery to re-consume data to ensure consistency.
- Sink – kafka producer as a sink, use two-phase commit sink, the need to implement a TwoPhaseCommitSinkFunction.
JobManager coordinates each TaskManager to checkpoint storage. Checkpoint storage is stored in StateBackend. By default, StateBackend is memory-level. You can also change to file-level persistence.
When checkpoint starts, JobManager injects the barrier into the data stream. Barriers are passed between operators.
Each operator takes a snapshot of the current status and saves it to StateBackend. For the Source task, the current offset is saved as the state. The next time the source task recovers from checkpoint, it can recommit the offset and re-consume the data from where it was last saved.
Every internal transform task that encounters a Barrier stores the state to its checkpoint.
The Sink task first writes data to the external Kafka, which is a pre-committed transaction (not yet consumed). When a barrier is encountered, the state is saved to StateBackend and a new pre-committed transaction is opened.
When the checkpoint is complete, JobManager sends a notification to all tasks to confirm that the checkpoint is complete. When the Sink task receives an acknowledgement, it will formally commit the previous transaction. The unacknowledged data in Kafka is changed to “confirmed” and the data can actually be consumed.
The execution process is actually a two-stage submission. When each operator is completed, “pre-submission” will be carried out until the sink operation is completed, and “confirmation submission” will be initiated. If the execution fails, the withholding will be abandoned.
Reference article:
www.ververica.com/blog/end-to…