1. Time
In Flink’s streaming processing, different concepts of time are involved, as shown in the following figure:
Figure Flink time concept
Event Time: indicates the Time when an Event is created. It is usually described by the timestamp in the event. For example, in the log data collected, each log records its own generation time, and Flink accesses the event timestamp through the timestamp dispatcher.
Ingestion Time: Indicates the Time when data enters Flink.
Processing Time: the local system Time of each operator that performs time-based operations. It is machine-specific. The default Time attribute is Processing Time.
For example, a log enters Flink at the time of 2019-08-12 10:00:00.123 and reaches Windows at the time of 2019-08-12 10:00:01.234. The content of the log is as follows:
2019-08-02 18:37:15.624 INFO Fail over to rm2
For services, when is the most meaningful time to collect the number of fault logs within 1 minute? EventTime, because we are counting logs based on when they were generated.
2. Window
2.1 summary of the Window
Streaming computing is a data processing engine designed to handle infinite datasets, an ever-growing set of essentially infinite data, whereas Window is a means of cutting infinite data into finite chunks for processing.
Window is at the heart of infinite stream processing. It splits an infinite stream into buckets of finite size, on which you can perform calculations.
2.2. The Window type
Before we go through the many operations of Windows, we need to explain one concept:
An endless stream of data is impossible to count because it has no boundaries and it is impossible to count how much data passes through the stream. The maximum, minimum, average, cumulative and other information in the data stream cannot be counted.
If you intercept a fixed size portion of the data stream, it can be counted. Windows can be divided into two categories:
Timewindows can be divided into three categories based on their implementation principles: rolling Window, Sliding Window and Session Window.
Tumbling Windows
Slice the data according to the fixed window length.
Features: Time aligned, fixed window length, no overlap.
The scroll window allocator assigns each element to a window of a specified window size, which has a fixed size and does not overlap. For example, if you specify a 5 minute scroll window, the window will be created like this:
Figure scroll window
Application scenario: Suitable for BI statistics (aggregate calculation for each time period).
2) Sliding Windows
Sliding window is a more generalized form of fixed window. Sliding window consists of fixed window length and sliding interval.
Features: time – aligned, fixed window length, overlap.
The sliding window allocator assigns elements to a fixed-length window. Similar to a scroll window, the size of the window is configured by the window size parameter. Another window slider parameter controls how often the sliding window starts. Therefore, sliding Windows can be overlapped if the sliding parameter is smaller than the window size, in which case elements are distributed among multiple Windows.
For example, if you have a 10-minute window and a 5-minute swipe, then each 5-minute window contains the data generated in the last 10 minutes, as shown below:
Graph sliding window
Application scenario: Collects statistics within the latest period (calculate the failure rate of an interface within the latest 5 minutes to determine whether to report an alarm).
3) Session Window (Session Windows)
It consists of a series of events combined with a timeout interval of a specified length, similar to a Web application session, that is, a new window will be generated if no new data has been received for a period of time.
** Features: ** time is not aligned.
The Session window allocator groups elements through session activities. The session window does not overlap and has fixed start and end times as compared to scrolling and sliding Windows. Instead, when it no longer receives elements for a fixed period of time, the inactivity interval is generated. That this window will close. A session window is configured with a session interval, which defines the length of the inactive period. When this inactive period occurs, the current session is closed and subsequent elements are allocated to the new session window.