Author: Xianyu technology — Qi Wu
Mach is idle fish choose goods and delivery system, idle most commodities in the fish business are orphan works namely single inventory goods, so the real-time change need immediate feedback of goods to choose goods and put in the link, the beginning of design in order to meet the business demands Mach sent the technology of real time as the most important goal, as the expansion of the run data real-time of the system is also encountered bottleneck, This paper will introduce the related work on improving the real-time performance of Mach in the past year, and the goal is to control the real-time delay of selected products in the second level. If you don’t know Mach, please review the relevant articles on Mach in the public account.
Select a real-time link
The data flow in the whole process of Selection and delivery of Mach is mainly divided into three parts as shown in the figure below. The first part is selection data access. Data is the basis of Selection of Mach, and real-time data is the basis for the effective operation of the whole Mach process. The second part is the selection rule calculation. The rule calculation is the core of Mach, and the real-time calculation ensures that Mach can synchronize the selection results to each downstream node. The third part is product delivery. Product delivery is the part closest to users. Real-time feedback can bring users a better experience.The real-time performance optimization of Mach also starts from these three parts. The needle optimization process is introduced in detail below.
Real-time performance optimization of selection data
Mach selection depends on the selection of wide table data, mainly composed of two types of data, one is commodity basic data, commodity basic data refers to the commodity information released by users, including: commodity type, commodity state, commodity price, commodity description, commodity picture; This kind of data has been realized in real time through the change of subscription commodity database. Additionally one kind is based on the statistics and forecast data derived from goods, this kind of data is generally through the ODPS internal offline computing platform (ali) offline calculation, output time compared with commodity base is mostly level of level of day or hour delay, the original data access scheme is as follows, whether level of hour delay or day level delay output data, All data are uniformly processed. Every day, all data are joined together and output to an offline data table. Then, the data is connected with the unified data access layer of Mach through BLINK and MetaQ message channel. The advantage of this scheme is that all offline output data only need to be processed once, which is convenient for operation and maintenance, and the data volume of the whole process can be controlled once a day. The disadvantages are also obvious. First, the output time of daily summary data depends on the data that takes the longest time upstream. If there is data delay, the time of the whole process will be affected; second, the data clearly produced by H+1 cannot be used until T+1, which seriously affects product selection ability.Problems have made it clear that the rest is the solution, the first thing to solve the problem of summary data depend on the upstream of the longest time-consuming, the conventional method is time summary data output, the timing time, upstream each node data output then use, have you use the last output without output data, the initial solution is adopted when the scheme, This scheme will cause some data to be T+2 before entering the selection width table. It’s not a very good solution but it’s a simple solution, and it worked for a while in Mach as a transitional solution. Here is the final solution Mach adopted for this problem, as shown in the figure below:Firstly, the data are classified according to the output cycle. The data output of H+1 is in the order of ten thousand, while that of T+1 is in the order of ten million. After the data output of H+1 is directly read by BLINK, MetaQ data channel is used to output to the unified data access layer, ensuring that data output is ready to use. For T + 1 data read directly using the BLINK, same as the part of the data is too large, if direct output QPS will reach millions of level for the downstream pressure is too big, so Mach use BLINK of sliding window aggregation ability, product ID as the KEY data aggregated, inside the window aggregation and then output to the Mach uniform data access layer. The smaller the sliding window is, the higher the real-time performance will be, but the system pressure will also be larger. Currently, the sliding window time of Mach is 6 hours. The scheme solved the commodity algorithms and statistical data real-time access to the problem, but the whole system flow increased 4 times, this is the design on system often entangled with the trade-off between time and space costs, chose to use in the optimization of trading space for time and moderate control the balance of the two level of optimization effect from the day after the delay level reduced to hours.
Real-time optimization of selection rule calculation
Selection rule calculation consists of two parts: one is the offline selection rule execution engine when the selection rule is created and the result set is calculated and produced in the Mach selection wide table; the other is the online selection rule execution engine when the product information is changed and the selection rule hit by the current product needs to be recalculated. The first part introduces the optimization of the offline selection rule execution engine. The selection rule created by the operation will be mapped to SQL and executed on the Mach selection wide table ADB. ADB is a full index data table. For example, in the selection width table, there is a mapping field ATTRIBUTE from the basic data table of commodities. From the name, it is not hard to see that this field is an extended field reserved in the design of commodities at that time, and its content is stored in the form of KV semicolon separation. Operations in the selection of the final mapping to the field of the rule SQL is using the LIKE keyword to retrieve over a billion data in its performance can be imagined. Therefore, multi-value mapping is carried out for this kind of field, that is, KV pair is mapped into numbers for storage. Although this scheme is not complicated, it solves the problem of commodity quantity calculation and timeness of commodity preview when creating rules. After optimization, commodity calculation is reduced from 6 minutes to 30 seconds, and the specific effects are as follows:The following part introduces the optimization of online rule execution engine in the second part. The online engine is realized in BLINK. The change information of goods is used as the input source, and the commodity pool hit by current goods is calculated in the execution engine. The original process is as follows:MERGE and DIFF operations, which are Mach key operations, are mainly completed in BLINK. MERGE combines the data in BLINK memory with the data input into the latest data by taking the maximum update timestamp on the field and integrating the product information. Then the integrated data is run on all selection rules, the running results are compared with the last result, and the data with different results are screened out as the result output. This step is called DIFF. The advantage of this method is that memory stores the latest data of goods, reducing IO reading and reducing time consumption. At the same time, only DIFF is output when output results, reducing data transmission and saving time again. The scheme also has the obvious drawbacks all data stored in the memory, in the event of unexpected abnormal data in memory will be lost, from the system never stop before you start running to the optimization, including the operational cost is, at the same time do not result in system downtime cannot upgrade and use the new version of the BLINK ability. The solution to the problem of non-stopping is as follows:When commodity information is received in BLINK, data will be pulled locally first. If not, corresponding data will be read from commodity information database. When the result is output, the original information will be output and commodity information and full rule hitting information will be added as backup storage to facilitate downtime and recovery. In order to reduce the storage space and data IO in the transmission and storage process of data are compressed. The whole logic of the final program is not very complicated, but how to smoothly switch from the original program to the current program, which involves data synchronization, data conversion, data proofreading, which stepped on a lot of pits, this part is not the focus of this article will not be repeated. Finally, the problem of not being able to stop was solved and the BLINK version was upgraded. After the upgrade, the data delay was reduced from the original 2 minutes to 2 seconds, making the whole online rules engine run faster and more smoothly.
Real-time optimization of product delivery
Product placement is the ability closest to the user side. After selecting commodities, the operation will form a commodity pool, and then use the commodity pool to build a page and put the commodities into the user. When the user requests, the commodities will be recalled from the commodity pool and presented to the user, as shown in the following figure.At present, Mach supports search recall and algorithmic recall. In search recall, real-time changes of commodity pool are synchronized to search engine for incremental iteration. The natural ability to support real-time recall only needs to ensure the stability of incremental. Algorithmic recall uses user attributes to recall related commodities, and the relationship between user and commodity is T+1 output, which is contrary to the real-time nature emphasized in The Mach scene. In order to solve the real-time nature of algorithmic recall, Mach made the following solution:First, let’s take a look at the recall process: When receiving a user request, we will have two information is user ID and a commodity pool ID, first using the user ID of the user personalized recall with commodity relational table, and then use commodities pool ID query selected pool and relation data table for general recall, finally combine two parts data to heavy here referred to as recall individual character, The data after the recall JOIN the relationship table between the commodity and the commodity pool, and only the commodities that meet the ID of the commodity pool are retained, which is called recall filtering here. If the quantity of commodities meets the recall requirements, the product will be returned; if not, the recall bottom-pocket data of the relationship table between the commodity pool and the commodity pool will be queried, which is collectively called bottom-pocket recall. After the recall process, the data is presented to the user through RANK and information supplement. There are multiple tables mentioned in the above flow, so how these tables are produced and why they are designed is described below. This part of data does not rely on any information of Mach. It only predicts the products that users like through user preference behaviors such as clicking and browsing in Xianyu. If the scale of related products of each user is 2000, it is T+1 output. Then introduce personality BE used in the recall of selected item pool and relational data, this part of the data is dependent on Mach offline synchronization engine output, the output logic here is first of all, according to the indicators of selected items in the pool is goods sorting, sorted before keep 5000 as a general recall is T + 1 output, use dependency finished personality step recall the two tables. Then it is the relationship data between BE goods and selection pool in recall filtering, which is updated in real time by using online synchronization engine. The reason why this step of recall filtering is designed to prevent T+1 goods recalled individually, but it is no longer in the commodity pool. So of course there is recall bottom logic, this part of the data is kept updated in real time by Mach’s synchronization engine and stored in IGRAPH, it can recall the latest 2000 items in the current pool using the ID of the commodity pool as the bottom. The above logic ensures the real-time performance of the recall algorithm.
conclusion
This article from the Mach selected items to Mach on the real-time optimization made a comprehensive introduction, each step is final plan provided by optimization, in order to ensure the smooth transition of system optimization in stepped on a lot of pit in eventually, though, are smooth, the optimized Mach from choose product to put the whole real-time link ShiYanYou a qualitative change, choose product data from T + 1 into H + 1, The selection process changes from 6 minutes to 30 seconds, and the delivery process changes from 2 minutes to 2 seconds. The system is more robust and real-time. From the overall function, Mach still belongs to a tool-level system, which is far from reaching the product-level system.As shown in the figure above, emphasis will be placed on product selection capability and overall operation and maintenance capability in the future. While optimizing the original system, new capabilities will be added and Mach will be gradually transformed into a productized system.