This is peng Wenhua’s 100th original article \

Flink is beginning to infiltrate factories, and although batch processing capabilities are weaker, streaming data is almost standard. Flink not only has excellent Checkpoint mechanism (streaming data snapshot), Watermark mechanism (solving out of order problems), but also its powerful rules engine –CEP (Complex Event Processing). It is like a regular expression, from a string of flowing data, according to the rules to extract the required data for processing.

CEP application scenarios

The CEP is essentially a rules engine that pulls out all the data that fits the rules. Flink’s ultra-high performance in real-time processing is ideal for things like real-time logical judgment, such as risk control.

As shown in the figure above, there is a huge amount of data flowing through the raw data stream. We define a CEP, assuming that an IP address steals five red packets in a row. Once this happens in the data stream, Flink quickly locks it up for subsequent processing.

Therefore, CEP is very suitable for all kinds of logical judgment in streaming data, and is suitable for some scenarios with high real-time requirements, such as abnormal behavior monitoring (risk control), strategic marketing (order scrambling mode), operation and maintenance (traffic jitter).

There are many examples of risk control, such as streaming traffic on video websites, snatching red envelopes on e-commerce websites and hacking. As long as there is relevant data in the data stream, a simple CEP can be instantly detected and immediately processed. For example, all accounts, in a short period of time to buy more than 100 preferential products, the basic can be judged as black production.

Strategic marketing scenarios, Didi shared some real-time marketing scenarios:

  • The passenger line bubbled for 1 minute without issuing a bill;
  • No driver picks up the order within 2 minutes after the passenger places the order;
  • Passengers compare fares between different lines of business.

After bykey, we set a CEP rule for each customer and monitor his actions. If there is no subsequent order after placing an order, we will carry out subsequent marketing actions:

Definition and use of CEP

The implementation of CEP is relatively simple, consisting of three main steps: \

  • Define pattern
  • Binding DataStream
  • Matching result Output

The code to define the schema is as follows:

pattern.next(“newP”).where( 

//Pattern: the previous Pattern is used to assemble multiple patterns

// Next: is a pattern type divided into strict continuous, loose continuous, and indeterminate loose continuous. Next is strictly continuous

        new SimpleCondition<Event>() {

            @Override

            public boolean filter(Event event) {

                return event.getid() ==1000;

//filter: core processing logic

            }

        }

)

The main core attributes that define a pattern are attributes, validity periods, and pattern sequences.

  • Model properties
    • Matches a fixed number of times, times
    • Match more than 1 times, oneOrMore
    • Matches send more than a timesOrMore
  • Mode validity
    • Set the validity period based on service requirements
    • If not set, the match event will continue
  • Pattern sequence
    • Strict continuity (next/notNext)
    • Loose continuity (followedBy/notFollowedBy)
    • And uncertain loose continuity (followedByAny)

A sequence of patterns is explained here. Strictly continuous means that two events have to be right next to each other; Loose continuity means that two events can be separated by other events; Nondeterministic loose continuous is repeatable.

Here’s an example:

As shown in the figure above, the original data flow is 12334, and the pattern defined is to find events 1 and 3.

For strictly continuous, there are only 12 and no 13 in the data stream, so you can’t match the results.

For loose continuities, we have 123 in the data stream, so we can find events 1,3, and the output is 1,3.

For an indeterminate loose continuous, there’s 123 in the stream, which gives us a 1,3, and there’s 1233 in the stream, which gives us a 1,3, so we get two 1,3 outputs.

Once you have defined one pattern, you can go on to define other patterns that can be assembled into more complex logic. \

For example, after bykey, an ID can buy more than 100 preferential products within 10 minutes, which requires browsing, ordering and payment of three events, and then set the time range, more than 100 times, the punishment alarm, make various anti-black production action.

conclusion

Real-time data processing has many application scenarios, which require a flexible, efficient, easy to operate ability to deal with.

Flink accomplishes these functions through regular expression-like CEP. We can define a CEP rule with very little code. The rule can set the pattern matching type, such as matching once, matching multiple times, and setting the pattern validity period.

In order to deal with the disturbance of flow data, Flink’s CEP also sets three matching methods: strict continuous, loose continuous and uncertain loose continuous.

In order to meet the application scenarios of multi-condition and complex logic, Flink can also connect multiple patterns together to form pattern groups.

In this way, we can complete risk control, real-time marketing and other real time requirements are very high.

Enjoy better with the following articles

Dry goods | what is called understand the business? Five levels of analysis

| real-time data packet 】 【 warehouse architecture design and type selection

[principle] | Flink how to use opportunely WaterMark mechanism to solve the problem of order

[principle] | Flink Checkpoints mechanism explanation

Relief | a sigh through series engine data calculation

Relief series | breath finished data warehouse modeling method

I need your upvotes. I love you