As the leader of OTA in China, Ctrip is exposed to severe fraud risks every day, such as personal bank cards being swiped, account numbers being swiped, marketing activities being swiped and resources being seized.

At present, Ctrip uses its own risk control system to effectively identify and prevent these risks. Ctrip risk control system started from zero, after five years of continuous exploration and innovation, has been able to effectively cover all links before, during and after the event. Is based on “simple rules + DB”, from development to the current 10 x to support trade growth of intelligent system, risk control model based on rule engine, real-time calculation, flow processing, M/R, big data, data mining, machine learning and other risk control system, with real-time, quasi real-time risk decision and data analytical ability.

I. Aegis System

& amp; amp; amp; lt; img src=”https://pic4.zhimg.com/v2-2add0b0278419f8accf37533582b247f_b.jpg” data-rawwidth=”960″ data-rawheight=”540″ class=”origin_image zh-lightbox-thumb” width=”960″ data-original=”https://pic4.zhimg.com/v2-2add0b0278419f8accf37533582b247f_r.jpg”& amp; amp; amp; gt; Figure 1

It is mainly divided into three modules: risk control engine, data service, data operation and auxiliary system.

Risk control engine: it mainly processes risk control requests and has pre-processing, rule engine and model execution services. The data required by risk control engine is provided by data service module.

Data services: Real-time traffic statistics, risk profiling, behavioral device data, External data access Agent, RiskGraph. The data provided by the data access layer is provided by the data computing layer

Data operation: mainly includes risk portrait operation, RiskSession, device fingerprint, real-time flow and non-real-time operation.

Data sources required for data operation are mainly risk control Event data (order data, payment data), UBT collected by each system, device fingerprint, log data and so on.

In addition to these, the risk control platform also has a very perfect monitoring and warning system, manual audit platform and report system.

Ii. Aegis System architecture

& amp; amp; amp; lt; img src=”https://pic2.zhimg.com/v2-ece8443d565e713c4d6e598857a4f571_b.jpg” data-rawwidth=”960″ data-rawheight=”540″ class=”origin_image zh-lightbox-thumb” width=”960″ data-original=”https://pic2.zhimg.com/v2-ece8443d565e713c4d6e598857a4f571_r.jpg”& amp; amp; amp; gt; Figure 2

Rule engine

The rules engine has three main functions, the first being the adaptation layer.

Because of ctrip business sort very much, and every business has its features, and after (Aegis) into the risk control system, in order to facilitate the risk control system for data processing, risk control front have a adapter module, the various business data are carried out in accordance with the standardization of internal risk control configuration transformation, use risk control system.

After data adaptation is completed. The risk control system will merge data.

For example, when there is a payment risk control check, the payment BU only throws the payment information (payment amount, payment method, order number, etc.). However, it does not contain the order information. At this time, it is necessary to quickly find the order information according to the payment information, and combine the two data for the rule and model to use. As we all know, from the user to generate an order to initiate payment, the time interval from seconds to days may be, when the interval is short, it will happen that the data to be merged has not been processed, so the order data from processing to landing to be very fast. The second step is to quickly find the order data. We can quickly and accurately locate the required order details according to the RiskGraph generated for the order information.

Preprocessing After the data merge, the preprocessing module starts to prepare the rules, variables required by the model, and tag data. In preparing the data, the preprocessing module relies on the data service layer we will explain later. Of course, in order to improve performance, we make reasonable arrangements for variables and tag data, and give priority to acquiring key rules, variables and tag data required by the model.

As we all know, the characteristics of fraudsters are wave by wave. The risk control system needs to be able to respond in time. When fraud is discovered, it can timely put up rules to prevent subsequent similar fraud. Therefore, the rules need to be fast and accurate. In this case, our rules need to be quickly online, and the rules staff can make rules and online. There is also the rule and the implementation of the rules of the engine to achieve effective isolation, not because of the rule unreasonable, affect the whole engine. The rules engine must meet these criteria.

We finally chose open source Drools. First, it is open source, second, it can use Java language, easy to get started, and third, it has enough functions

In this way, Ctrip risk control engine realizes efficient rule on-line ctrip risk control real-time engine makes it very flexible and configurable by using the rule engine Drools. Moreover, because it is Java syntax, rules personnel can formulate rules and quickly online.

Since hundreds of rules and models need to be executed for each risk control Event request, the risk control engine introduces rule execution path optimization. The parallel + serial, dependent + non-dependent rule execution optimization method is established, and then the short-circuit mechanism is introduced to control the running time of thousands of rules within 100ms.

& amp; amp; amp; lt; img src=”https://pic2.zhimg.com/v2-4c55269b0f8cd9ba2de82a7741fcbdcd_b.jpg” data-rawwidth=”960″ data-rawheight=”540″ class=”origin_image zh-lightbox-thumb” width=”960″ data-original=”https://pic2.zhimg.com/v2-4c55269b0f8cd9ba2de82a7741fcbdcd_r.jpg”& amp; amp; amp; gt; Figure 3

The flexibility of rules is very strong, formulate, online very fast, but the rules of a single coverage rate is low, if you want to increase the coverage you need quite a lot of rules to cover, this time the rules of maintenance costs will be high, so this time will need to use the model, the characteristics of the model is the coverage coverage can be done is higher, the logic model, it is very complicated However, it needs to carry out offline training, so Ctrip risk control system uses the characteristics of rules and models to complement each other.

Logistic Regression and Random Forest are mainly used in the current risk control system. After the two algorithms are used, the current situation is as follows: When the LR training variable differentiation is good enough, the feature engineering effect is better. RF when the linear differentiation ability of variables is weak, the efficiency is higher. So RF is used in a large proportion.

4. Data service layer

The main function of the data service layer is to provide data services. As we know, a lot of variables and tags need to be obtained in the pre-processing of the risk control engine, and the data of these variables and tags are provided by the data access layer. The most important purpose of this service layer is to be responsive. Therefore, Redis is mainly used as the data cache in the data service layer, and Redis is directly used as the persistence layer for important and high-frequency data.

The core idea of the data services layer is to make full use of memory (local, Redis)

1. Local memory (a large amount of fixed data, such as IP location, city information, etc.)

2. Take full advantage of Redis high performance cache

Because the data of real-time data flow service and risk portrait data service are directly stored in Redis, their performance can meet the requirements of rules engine, we focus on the data access proxy service here.

The most important idea of data access proxy service is to call the third party service before the data is called by rules, and save the data in Redis, so that when the rules request, it can be read directly from Redis. Since the preloading is done, the freshness and hit ratio of the data are very important. Our user related dimensions of data, for example, risk control system based on the analysis of user log, which can detect the user login, browse, a predetermined action, so that you can put the user in advance relevant external service load data into Redis, when rules, model of external data read user dimensions, first read directly in Redis, If not, then access the external service.

In some scenarios, we also introduce DB to make persistence. When some information of the user changes, the public service will send a Message to Hermes, and we will subscribe to the Message. When we know that some information of the user has been modified, we will actively visit the external service to obtain data into Redis. Since the risk control system can know the Message of the change of the data, it is ok for the data to be persisted to the DB. Of course, the data also has a TTL parameter to ensure its freshness. In this scenario, when Redis is not hit, the system searches DB first, and only when there is no data meeting the requirements in both places, will the system access the external service. At this time, its performance and storage space can be optimized.

Chloro system

The Chloro system is a data analysis service that is at the heart of the entire risk management system. The data service layer is calculated and supplied by the Chloro system.

The main analysis dimensions mainly include: user risk portrait, user social relationship network, transaction risk behavior characteristic model, supplier risk model.

& amp; amp; amp; lt; img src=”https://pic4.zhimg.com/v2-1cca0cff5074a63b880e3193415b7a2b_b.jpg” data-rawwidth=”960″ data-rawheight=”540″ class=”origin_image zh-lightbox-thumb” width=”960″ data-original=”https://pic4.zhimg.com/v2-1cca0cff5074a63b880e3193415b7a2b_r.jpg”& amp; amp; amp; gt; Figure 4.

It can be seen that data sources mainly include Hermes, Hadoop, and various risk control Event data thrown from the front end. The Listener is used to receive various types of data, and then the data will enter the CountServer and real-time Process system, and the RiskSession data will enter the Sessionizer. This module can quickly reduce Session processing. According to different keys, it is reduced into a session, and then submitted to the real-time processing system for processing. After the Chloro system is processed by Real Time Process and CountServer, the Data is divided into two parts: the results and the raw Data, which are submitted to the Data Dispatcher for routing within the Chloro system. The results are fed directly into RiskProfile for use by the engine and model. Raw data is written to the Hadoop cluster.

Batch Process takes advantage of the big Data processing capability of Hadoop cluster to Process offline Data. When the Batch Process is well processed, it also sends the processing results to the Data Dispatcher for Data routing.

Batch Process can also perform data analysis across Rsessions.

Figure 5

The first event that anyone accesses Ctrip from any device is considered to be the Rsession start, and the last event that no trace is left within 30 minutes is considered to be the Rsession end.

The risk control system compares user information: Uid, mobile phone number, email address, device information: Fp (Fingerprint), clientId, VID, V, and deviceId to determine whether the user is the same user, and determines behavior similarity based on its behavior information: browsing path and history path.

For example, Chloro is a Session where users place an order on a PC and then pay in an APP. This Session is called a risk Control Session. By definition, Chloro is a risk control system that quantifies and characterizes user behavior. So Risksession can actually act as a Container for user behavior. RiskSession can be used across platforms, making it easier to analyze user characteristics.

Figure 6.

Risk Graph is developed according to the characteristics of Ctrip Risk control system. Risk Graph is a system based on HBase as the storage medium. For example, the value of the user as the node is the key of the HBase user table, and each column is the feature. This creates a graph-like architecture based on HBase.

Therefore, a core idea of the system is to create the data index of each dimension, and then search the content according to the index value. At present, the risk control system has created a dozen dimensions of rapid index.

6. Aegis other subsystems

& amp; amp; amp; lt; img src=”https://pic1.zhimg.com/v2-65e08522198aa42a7e45e1cdb66f0288_b.jpg” data-rawwidth=”960″ data-rawheight=”540″ class=”origin_image zh-lightbox-thumb” width=”960″ data-original=”https://pic1.zhimg.com/v2-65e08522198aa42a7e45e1cdb66f0288_r.jpg”& amp; amp; amp; gt; Figure 7.

Aegis and configure the system, the user can above for a variety of configurations, such as rules, rules run path, standardization, tag, variable definitions, has data cleaning business logic, etc., of course, monitoring and control system is also very important, risk control research and development in accordance with the monitoring of the ubiquitous design concept, make its can be found in the first time any tiny change of the system.

Looking forward to,

Ctrip risk control in 3.0 with the introduction of the rule engine, the extensive use of open source in the Chloro system based on large data processing, the architecture of the model obtained very good effect, in 4.0, will be in machine learning, artificial intelligence, behavior characteristics such as direction to continue, further improve recognition ability, risk control system for technology will continue to embrace open source technology, In the next step, Spark will be introduced to improve the data processing capability of the risk control system.

Yu Wei, Senior Development Manager, Risk Control Department, Ctrip Technology Center. Joined Ctrip in 2010, participated in the development of Ctrip settlement platform and risk control system, and had in-depth research on system architecture and streaming data processing.


Want to work with ctrip friends? Risk control team now open positions (risk control R&D engineer, senior model analyst, etc.), interested partners click [email protected].


Didn’t see enough? More technical staff from Ctrip, welcome to pay attention to ctrip Technology center wechat official number ctripTech oh ~