How to use PAI+MaxCompute to complete the AARRR link of user growth model, including drive new, promote live, retain, generate revenue, share.

The author of this article is Li Bo Ali Cloud intelligent senior product expert

In the past year, Aliyun PAI machine learning team has done a lot of business practices, one of which is the product solution practice based on MaxCompute+PAI to solve problems related to user growth. This article mainly shares some exploration and practice of Ali Cloud team in the field of user growth. I hope you can share with us some help in user growth.

I. User growth model

AARRR

User growth is more targeted at Internet type companies, and the business of Internet customers is essentially to solve the problem of user growth. There are many models for user growth from a business perspective. Today I will focus on the AARRR user growth model.

Internet APP operation students should be very familiar with AARRR model, which regards the user growth of the whole Internet product as a circular structure. First, the top part is raxin, which is important to the business, and raxin corresponds to the business indicators of access, download, registration and attention. Before a few years pull new is very hot, because Internet user bonus is still in. But now that Chinese Internet users have reached a ceiling, how we grow our products is particularly important for the following parts. For example, fiction apps are quite popular now, because when the number of users fails to increase, it is particularly important to increase the duration of users. Therefore, fiction business can help prolong the duration of users’ stay in the APP. Activating index is login, click browse, stay time. The next step is retention, when we can’t acquire new users, we try to bring back our inactive users and lost users. MaxCompute+PAI has many classic examples of retention. There’s also a lot of work to be done in the AI space on how Internet apps monetize traffic and user behavior. Fission apps pay more attention to sharing metrics.

In the AARRR user growth model, what does MaxCompute+PAI do in which modules? What value can it bring to customers?

MaxCompute+PAI Service support architecture

MaxCompute+PAI serves as a base to support user growth. The product architecture is shown in the following figure.

From the computing engine layer is MaxCompute, and on top of the computing engine is the AI scenario, we focus on the business scenario based on PAI machine learning AI capability to empower user growth. First, we provide an open framework for developing our own algorithm model based on TensorFlow\PYTorch, SQL\PYSpark\Spark. The product layer above is PAI machine learning product system, the whole product system is also as a support for our business, Including Pai-DLC (cloud native deep learning operating environment), which can pack its own code training scripts into a mirror package and run in DLC, And Pai-Studio (visual modeling), which can modularize operators related to the user growth field after simple drag and drop. Pai-dsw (Interactive modeling) for developers with strong technical skills, they can develop their own scripts instead of using our encapsulated scripts. Pai-eas (online modeling service) can generate a RESTful API from the models generated by Studio and DSW. The service is then invoked as an HTTP request. Generated RESTful requests can support solutions, including advertising RTA solution, advertising DSP solution, intelligent recommendation solution, user recall solution, LTV calculation solution. The solution is ultimately to solve the problem of user growth, including recruitment, activation, retention, monetization, and sharing.

MaxCompute+PAI specifies the user growth category

User growth – Pull new

At present, through advertising to attract new is still a core important means of Internet customers. One popular program in the advertising industry is RTA. What does MaxCompute+PAI do in the RTA scheme? First of all, let’s look at the principle of RTA. In the past, if an APP wanted to attract new users, it would put money into the DSP advertising platform, and the platform would select users for bidding. Then RTA did one thing, is that when advertisers want to control some DSP crowd, there was no way before, with the support of RTA technology, open an interface, every time the advertising platform in the circle of users, will request a model, the role of the model is to tell the platform, this user want. MaxCompute+PAI can generate such a model for the customer.

Through MaxCompute to do data cleaning, through PAI to do bidding model training, through the model screening worthy users.

Core strengths

1. Powerful data computing capability: MaxCompute provides PB level data computing capability.

2. Rich algorithms: PAI provides classical machine learning algorithms such as LR and GBDT, as well as deep learning algorithms such as DeepFM and MultiTower.

User growth – active

In the case of fewer new users, we hope that existing customers can browse on our platform for longer and click more. When you open an Internet APP, more than 70% of the apps have a feed stream recommendation, which can also be called relevance recommendation. The accuracy of the recommendation rate of this system affects the user’s activity on the platform. If the recommended content is what users like to see and browse, it will naturally increase the number of clicks on the platform and the length of stay will increase. For example, popular short video apps in the industry actually have good personalized recommendation systems. How to build a recommendation system based on MaxCompute+PAI As shown in the figure below, a correlation recommendation system can be made based on MaxCompute+PAI+DataWorks+Hologres+Flink. More specific information can refer to the article: PAI platform to build enterprise personalized recommendation system

A good recommendation system first needs an online service module, which can be divided into multi-way recall, filtering, sorting and cold start. Recall module is to do a rough screening. For example, when a user comes in, we have 10 million products in stock on our platform. Comparing this user with 10 million products, it is actually a huge amount of calculation. For example, if I select hundreds of products, then I will do the sorting of the hundreds of products by the user, and the complexity of the whole calculation will be very low.

MaxCompute+PAI for recall and sort. From the perspective of the architecture diagram, at the bottom level, we need to upload the three core tables of user behavior log, user portrait data and material attribute data to MaxCompute, and use DataWorks to do a feature processing for the table to process training samples, user characteristic data and material characteristic data. Next, go to Pai-Studio, a built-in modeling platform that builds algorithms for recommended domains such as Pai-EasyRec, GraphLearn, and Alink. We made use of the recall algorithm in Pai-Studio to produce some basic recall tables, such as U2I, I2I and C2I, and put these results into Hologres. As a result, we could associate the multi-way recall service with Hologres, which solved the problem of recall model training.

The sorting service can select the sorting algorithm in Pai-Studio and produce the sorting model, which can be deployed to PAI-EAS as a RESTful API, so that the sorting module can request the RESTful API of the sorting model to produce a real-time sorting result.

After our multi-way recall, filter out some duplicate goods and sort them, you can get a TopN recommendation list. You can display it in your APP’s feed stream. The value of MaxCompute+PAI is to complete the data processing and model training of the entire sorting service. This set of relevant recommendation system will effectively improve the conversion rate of CTR and CVR of feed streams in our APP, and help the APP improve users’ activity and stay time.

User growth – Retention

When the stock of users of an APP reaches millions, tens of millions or hundreds of millions, a large number of historical users who have not used the APP for a period of time will be stored in the database. Therefore, the current situation of the Internet to pull new difficulties, we need to “sleep” users and lost users to do a recall. Currently, the most popular solution in the Internet industry is SMS recall, because SMS does not have the limitations of making a phone call and will not be intercepted like push. In view of SMS, the effect and probability of reaching users is relatively high.

Based on MaxCompute+PAI, SMS recall solutions for lost users have been constructed for many industries, such as novels, social networking, games and other industries.

The general approach is to store user buried point data in MaxCompute, do feature processing by DataWorks, and train a lost user recall model with PAI machine learning platform. Then we can make a prediction for existing stock users and predict which stock users have a high probability of returning to APP when they are reached by SMS. In this way, we can only focus on this part of high probability users for SMS recall, which can save our recall cost and improve our recall rate.

Customer case

The customer is a stranger social APP, and there are nearly ten million sleeping users in the database. Recall lost users through SMS.

PAI’s core values:

After using PAI, the recall rate of millions of SMS messages increased from 3% to 8%, the effect increased by 267%, and the cost was reduced by about 2 times.

User growth -LTV points & Share points

The score prediction model is constructed by PAI+MaxCompute, which can predict LTV score and share probability score.

When an APP brings in a user through an AD, it cares whether the user will pay, or how much APP value is generated. Some customers need to figure out how much that user will spend on the APP in the future as a new user comes in. If the user is a high value user, you need to activate the user through coupons or subsidies. We provide LTV solutions. For example, for a new APP user, how do we calculate his LTV score?

Find a third-party data source, since no behavior logs have been generated for new users in the APP. MaxCompute+PAI provides a set of joint modeling solutions that conform to trusted computing standards. That is to say, there will be no contact between user data and third-party data. The data of the two parties can be federally modeled and a model can be generated in PAI. This model can give LTV score to each new user and guide subsequent operation activities according to LTV score.

Customer case

Scenario Description: The customer is a novel platform. For pure new users, the prediction of purchasing VIP service within 30 days is required. In order to predict the future VIP purchase behavior of users when they have little behavior, it can make the operation of new users targeted and improve the operation efficiency.

For pure new users, the judgment accuracy of PURCHASING VIP is significantly improved. About 40% of the users are selected as training data, and the model generated by federal modeling can identify 67% of the members who will naturally buy VIP, improving the operation efficiency of 67.5% (compared with randomly selected users).

Three, practical operation introduction – recall lost users

The data is uploaded to MaxCompute

Tunnelupload {file}{table}; tunnelupload{file}{table};

Document link: help.aliyun.com/document\_d…

Building a Workflow

Go to Pai-Studio and finish building your Workflow.

Construction training sample: 7 days do not log in as lost users

By filtering the registration date and last login time, you can determine which users are 7-day logged-off users.

Characteristics of the processing

The processing of data into structured data.

One – hot coding

One-hot encoding can convert category variables into a form that is easy for machine learning algorithms to use. The one-HOT conversion format is shown in the figure below.

Model training and evaluation

After logistic regression model training, PAI platform has dozens of classification models. Judging whether texting can recall can be defined as dichotomous problem, yes/no. Binary classification algorithm can be used for model training. After the training of the logical model, we take part of the data as test data, and then we can get the model effect. We generate a model evaluation report under the dichotomous evaluation. The larger the area of ROC value, the better the effect of the model.

Model to predict

After generating the model, we can deploy the model as RESTful service for the business side or operation students to call. The call format is as follows:

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.