Author: Idle fish Technology — Luo Bin

In the series of technical articles of Omega real-time touch system, the three subsystems of behavior acquisition center, CEP rule center and user touch center have been introduced in detail. Xianyu has defined its own DSL language (domain-specific language), which transforms complex code development into a kind of concise expression in the form of SQL. In terms of the underlying implementation, different high-level languages can be used in the end-to-end, front-end and cloud, such as python, c++, javascript, Java, etc. This lowers the technical threshold and improves the efficiency of research and development, which brings the question to be introduced in this paper: how to achieve the translation of custom DSL language to a variety of low-level high-level languages?

Problems encountered in DSL language translation

Omega system not only realizes the complex event computing (CEP) engine in the cloud, but also realizes the complex event computing engine in the end and front end respectively. Compared with the cross-user behavior calculation in the cloud, the end and front end CEP are more focused on the calculation of single user behavior, which is more real-time and secure. Because of the difference in the implementation of CEP computing engines at each end, developers can only be limited to their own areas of development, which has a high technical threshold for cross-end development, and the time cost will be uncontrollable. Therefore, we propose a custom DSL language to mask the technical differences at each end. In an ideal world, developers should focus only on the business logic and not on other technical details, as shown in red below.

The technical differences between CEP computing engines mainly include input data, CEP computing API, execution container, result output, etc. In terms of input data, the input data processed by the end-to-end, front-end and cloud are different. For example, the end-to-end/front-end can process the data that the user stays on a page for 5 seconds, but the cloud cannot perceive it. Therefore, the customized DSL language must be compatible with the differences of the input data of each end at the data input level. In terms of CEP calculation API, the basic CEP calculation API designed by each end can be different, such as I = I +1 and I ++. Without a unified set of protocol specifications, it is easy to lead to rampant expansion, which will increase the difficulty of unified translation in the later stage. In terms of the execution container, the end-to-end and front-end computing are performed on Ali’s end computing container Walle, while the cloud computing is performed on Ali’s stream computing container Blink. In terms of result output, because each end corresponds to the same user contact center, each end is basically consistent in the calculation result protocol. After making clear the differences between each end, the following core contents can be sorted out:

Input data protocol: compatible with the difference of input data at each end; CEP computing API: facilitate the implementation of unified translation and control of basic protocols; Selection and translation of translation framework: compatible with the translation work of high-level languages at all ends, so as to facilitate the unification of the upgrade and iteration of capabilities at all ends; Shielding the difference between execution containers: solving the mapping between execution containers and CEP computing engines at each end;

Design and implementation of DSL language translation
Unification of input data

It is generally accepted in the industry to build a common data template layer to mask the differences in input data of each end, and each end registers its own input data instances as required. The advantage of this is that the input to the custom DSL language can be consolidated and subsequently translated into each side language according to the registered side specific instances that conform to the template specification. We also adopted this approach to deal with the differences of input data at each end. The input data protocol template we defined is as follows:

{
    "eventAlias":"Event alias"."eventCode":"PUBLISH_ITEM"."eventDesc":"Seller's details viewed"."eventTime":"Time of incident"."updateTime":"Event Update Time"."partitionId":"Partition id"."userId":"User id"."extraInfo": {"itemId":"Product id"."buyerId":"The buyer id"."sellerId":"The seller id"."itemType":"Type of commodity"."itemStatus":"State of goods"."categoryId":"Category id"."latitude":"Longitude"."longitude":"Latitude". :... },"scene":"Scene"."fromScene":"Last Scene"."toScene":"Next scene"."isFirstEnter":"First entry"."bizId":"The only Id"."sessionId":"Session id"."actionType":"Behavior type"."actionName":"Behavioral identification"."ownerName":"LuoBin"
}
Copy the code
CEP computing API unification

Flink CEP is a mature protocol specification in the industry for the unification of various CEP computing apis. The basic computing API of Flink CEP is split more reasonably and has higher acceptance at all ends. Therefore, based on the protocol specification of Flink CEP, we have defined a set of general computing API protocol specification for Idle Fish CEP computing engine, and each end can implement specific API according to the protocol. The protocol specification is as follows:

public static <X> Pattern<X, X> begin(final String name); 
public static <X> Pattern<X, X> begin(final String name, 
                  			final AfterMatchSkipStrategy afterMatchSkipStrategy);
public Pattern<T, F> where(IterativeCondition<F> condition);
public Pattern<T, F> or(IterativeCondition<F> condition);
public Pattern<T, F> until(IterativeCondition<F> untilCondition);
public Pattern<T, F> within(Time windowTime);
public Pattern<T, T> next(final String name);
public Pattern<T, T> notNext(final String name);
public Pattern<T, T> followedBy(final String name);
public Pattern<T, T> notFollowedBy(final String name);
public Pattern<T, T> followedByAny(final String name);
public Pattern<T, F> optional(a);
public Pattern<T, F> oneOrMore(a);
public Pattern<T, F> greedy(a);
public Pattern<T, F> times(int times);
public Pattern<T, F> times(int from, int to);
public Pattern<T, F> timesOrMore(int times);
public Pattern<T, F> allowCombinations(a);
public Pattern<T, F> consecutive(a);
public static <T, F extends T> GroupPattern<T, F> begin(final Pattern<T, F> group, 
	                         final AfterMatchSkipStrategy afterMatchSkipStrategy);
public static <T, F extends T> GroupPattern<T, F> begin(Pattern<T, F> group);
public GroupPattern<T, F> followedBy(Pattern<T, F> group);
public GroupPattern<T, F> followedByAny(Pattern<T, F> group);
public GroupPattern<T, F> next(Pattern<T, F> group);
Copy the code
Translation framework and implementation

After unifying the input data and the CEP computing API, you can begin to customize the translation design from the DSL language to the unified CEP computing API. As CEP computing engine has various implementations, the translation framework must be able to support the translation of multiple target languages. Currently, Antlr V4, Parboiled and Apache Calcite are the most widely used translation frameworks in the industry, and their respective characteristics are shown in the following table:

Antlr V4 Apache Calcite parboiled
Supported languages ActionScript, Csharp2, Delphi, JavaScript, Perl5, Ruby, C, CSharp3, Java, ObjC, Python Java Java and scala
Use case Hibernate, Apache, Hive, TOra, Esper, StreamBase, Spark Hive, Drill, Flink, Phoenix, Storm
scope Lexical parsing, syntax parsing, middle syntax tree generation Is an open source SQL parsing tool, non – custom DSL parsing tool A Parser framework that needs to be developed by itself, without the AST concept
other Idea has Antrl V4 plug-in, convenient development

Combined with the characteristics of the above various translation frameworks, Antlr V4 translation framework can friendly support our needs for multi-language translation, and development is more convenient, we finally choose Antlr V4 translation framework. According to the custom DSL syntax and the unified CEP calculation API, a set of syntax parsing files can be designed, and then Antlr V4 can generate THE DSL syntax parser and AST syntax tree. Finally, combining the characteristics of each end, the translation process from AST tree nodes to high-level languages can be completed, as shown in the following figure:

Perform masking of the container

We added the concept of a DSL rule type for the one-to-many mapping between the execution container and each side of the CEP computing engine. The use of DSL rule types to associate the corresponding execution containers hides developer awareness of the underlying execution containers. In addition, we designed a DSL editor and provided syntax and event hints, audit flow, resource management, result query and other auxiliary functions, which we believe will provide a friendly experience for developers.

Practical application effect

At present, Omega’s translation scheme has been tested by double 11 practice and has achieved remarkable results in lowering technical threshold and improving development efficiency. Through the customized DSL language to shield the differences between languages, so that people with a little SQL experience can quickly enter the development, the technical threshold of development plummets, developers can focus on the implementation of business logic. Through the practice of singles’ Day, we can compress the development amount of the previous week to 1-2 hours by customizing DSL language to develop business rules. The average development time of a DSL business rule is about 10 minutes, and the development efficiency has been improved exponentially.

Follow-up development plan

Omega’s development ecosystem has reached a certain scale, exporting a series of core protocol standards and providing a concise and efficient integrated development and operation environment. At present, the end side and cloud have been connected, and the front-end will be connected in the future to develop in a wider field. The technical details of each end translation will be introduced in detail later. In addition, based on the core protocol standard of the translator, to further deepen xianyu DSL language capability, external output protocol standard and mature translator products are also planned.