concept

Flink supports different notions of time in streaming programs.

Flink supports different concepts of time streaming.

  • Processing time: Processing time refers to the system time of the machine performing the corresponding operation. When streaming programs run on processing time, all time-based operations (such as time Windows) use the system clock of the machine running the corresponding operator. The hourly processing time window will include all records that reach a particular operator between the time the system clock indicates the entire hour. For example, if the application starts running at 9:15 a.m., the first hourly processing time window will include events processed between 9:15 a.m. and 10:00 a.m., the next window will include events processed between 10:00 a.m. and 11:00 a.m., and so on. Processing time is the simplest concept of time and does not require coordination between streams and machines. It provides the best performance with the lowest latency. However, in distributed and asynchronous environments, processing time does not provide determinism, as it is susceptible to the rate at which records arrive in the system (for example, from message queues) to the rate at which records flow between operators within the system, and to downtime (scheduled or otherwise).

  • Event time: Event time is the time at which each event occurred on its production equipment. This time is usually embedded in the record before it enters Flink, and the Event TIMESTAMP can be extracted from each record. At event time, the progress of time depends on data, not any wall clock. Event-time programs must specify how to generate an event-time watermark, which is a mechanism for representing progress at event time. This watermarking mechanism is described in a later section, below. In a perfect world, event time processing would produce perfectly consistent and deterministic results regardless of when the events arrive, or what the order of events is. However, unless known events arrive in order (by timestamp), event time processing generates some delay while waiting for out-of-order events. This limits the performance of deterministic event-time applications because they can only wait for a finite amount of time. Assuming that all data has arrived, event-time operations will behave as expected, producing correct and consistent results even when working with out-of-order or delayed events or when reprocessing historical data. For example, the hourly event time window will contain all records that carry a timestamp of events that fell into that hour, regardless of the order in which they arrived or when they were processed. (For more information, see the section on late Events (# Delay Element).) Note that sometimes when event-time programs are working with live data in real time, they will use processing time operations to ensure that they do so in a timely manner.

  • Ingestion time: Ingestion time is the time when the event enters Flink. In the source operator, each record gets the current time of the source in the form of a timestamp, which is referenced by time-based operations such as time Windows. **Ingestion Time is conceptually located between Event Time and Processing time **, **** has a slightly higher cost than Processing Time, but the results are more predictable. Due to ingestion time. With a stable timestamp (assigned once at the source), the same timestamp is referenced for different window operations on the record, while the Processing time_ window operator can assign records to different Windows (based on the local system clock and any transfer delays). In contrast to Event Time, the Ingestion Time program cannot handle any unordered events or late data, but the program does not have to specify how to generate watermarks. Internally, Ingestion Time is handled like Event Time, but with automatic timestamp allocation and automatic watermark generation.

Setting a Time Characteristic

The first part of the Flink Datastream program is usually set to basic

time characteristic

. This setting defines how data stream sources will behave (for example, whether they will be assigned timestamps), and things like ‘keyedstream.timeWindow (Time). What time concept should be used for window operations such as seconds (30) ‘.

The following example shows a FLink program that aggregates events in an hourly time window. The behavior of Windows varies with time characteristics.

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime); // alternatively: // env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime); // env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime); DataStream<MyEvent> stream = env.addSource(new FlinkKafkaConsumer09<MyEvent>(topic, schema, props)); stream .keyBy( (event) -> event.getUser() ) .timeWindow(Time.hours(1)) .reduce( (a, b) -> a.add(b) ) .addSink(...) ;Copy the code

If not set, StreamExecutionEnvironment default is process_time!

Flink 1.9.2

end