Discussion on offline algorithm and online algorithm of recommendation engine

Recommendation engine is a set of recommendation service framework of Ali Cloud. You may have heard the words “personalized recommendation” and “thousands of faces” on Taobao for a long time. You are curious about why you can accurately recommend your favorite baby to different buyers, and wish you could have such a recommendation system. In this post, the offline algorithm and online algorithm of recommendation engine products will be explained to you, and it will be convenient for you to better guide how to adjust if common calculation rules are found to be inconsistent with your own scenarios in the subsequent use of the product. How do you use the product in the beginning, you can look at the product documentation, you can look at the video, right

System architecture

How does a recommendation engine work, and why does it just need to provide some user/product/behavior data to know who likes what? We can take a look at a diagram from the document. The framework of the recommendation engine looks something like this:

Let’s not talk about API writing and real-time fixing. When MaxCompute is ready, it can be called to generate the recommendation result in real time. There are two steps: the recommendation result is computed in the offline calculation, and the offline calculation is saved in the table store. The second step will process and display the recommendation results through online algorithms. So if you don’t get it right, like recommending a completely unrelated item, look for offline algorithms. For example, the number of recalls needs to be adjusted, or if the number of recalls is too small to fill requirements such as default values, you need to work on the online algorithm (of course, default values may need to be generated offline). The online algorithm and the offline algorithm are used together, so you can see the matching in the template.

Off-line calculation

We learn about the offline algorithm from the default detail template (detail_ofL). Open the algorithm and you can see the flow chart of the algorithm looks like this:

Each line in the diagram represents the dependency of the task. This doesn’t seem intuitive, so I changed it:

It can be seen that the offline calculation of detail_ofL template actually has two main lines. One is to generate item_ITEM_REC_list by crS_04 and CRs_02 respectively, and finally to compile an output result by ST_CB_01. Crs_05 and crs_03 generate user_ITEM_rec_list, which is collated into a result table by st_CB_02. The table item_item_rec_list records the results of recommendations based on items. It can be inferred that these two items are relatively close. For example, beer and diapers are typical examples of item_item_rec. User_item_rec_list is a recommendation for the user, for example, the system finds that you and I are both runners. One day I bought a nice pair of shoes, and I figured you might like them, too.

Online calculation

Let’s take a look at the matching online algorithm of Detail_OFL. The flow chart is as follows:

This graph is relatively simple. Mg_usr_itm_reclist is first used to put the item-base and User-base recommendation results of offline algorithms. Item_item_rec_list data is placed first because, in general, the number of results recalled by item is small but relatively accurate. Since item-base and user-base might recommend the same item after two union All, I then made a uniq_reclist for reduplication. Finally, a get_top is used to set the number of recalls.

Other algorithms

Now that we have the detail template, let’s compare it to the main template, which is much simpler. The home recommendation is the recommendation based on the person, there is no item part, so it is actually the st_CB_02 of the detail template, and the user_ITEM_rec_list line is computed. Get_usr_based_rec = get_usr_based_rec = get_usr_based_rec = get_usr_based_rec = get_usr_based_rec = get_usr_based_rec

Then we look at detail_dFT, which is the default recommendation list calculated using simple_default_list based on detail_ofL. Then use get_default_rec from the corresponding online template.

Finally, let’s take a look at the algorithm, which is the example of recommending movies with movie data in Quick Start. In the example, we can screen out everyone’s favorite degree of movies according to the score of movies. This data needs to use SPL_grd_SVD. However, if detail_OFL is used, errors will be reported when data is computed offline. Comparing the two templates, spl_grd_svd starts with grade_based_sm and detail_ofl starts with IG_SM_02. Ig_sm_02 with ‘click’, ‘search_click’ and ‘consume’, ‘use’, ‘read’, ‘collect’ and ‘comment’, ‘share’, ‘like’, ‘view’, Grade_based_sm selects only bhv_amT corresponding to BHv_type =’grade’ for grade calculation. If detail_ofl is used for movie data, only grade operation is found, and no other behavior is found, an error will be reported because no user behavior data is found.

Algorithm of category

You can see that the framework of the algorithm is fixed, and if it needs to be modified later, it is not completely overturned and started from scratch. You can choose a template to modify. Each algorithm has its own data input and output, and some algorithms are actually just different inside the algorithm, input and output, which upstream and downstream are the same. Therefore, if you want to write a custom algorithm based on your actual data, you can first find out which steps of the algorithm can be optimized according to the above mentioned, and then write an algorithm replacement. Does it look like building blocks, replacing the previous component with a block of the same shape? Such identical blocks are called a category. When you customize an algorithm, you need to set the category of the algorithm.

After the above introduction, you should have a general understanding of the calculation logic of the recommendation engine. But the truth comes from practice. It’s better to build one on paper, don’t you think

This paper products are used in large data computing services (MaxCompute), address is www.aliyun.com/product/odp… And recommendation engine dtboost.shuju.aliyun.com/re#/myre.

Discussion on offline algorithm and online algorithm of recommendation engine

System architecture

Off-line calculation

Online calculation

Other algorithms

Algorithm of category

Related Posts

I recommend a Google Python library for beginners

The UC kernel team set out to build a new Flutter rendering engine, Hummer

Optimization experience of a production accident