The battle of marketing algorithms is over. Simplify code and transfer learning are the biggest winners

Edit | Debra

AI Front Line introduction:On June 5, Beijing time, the final of IJCAI 2018 Ali Mom International Advertising Algorithm Contest officially ended in Hangzhou. After the speeches, oral defense and the discussion of 5 judges among the 8 finalists, DOG team won the championship. Blue Whale Burning Incense Team and Lying team ranked second and third respectively. Team Ah and Team Strong East won the Innovation Award, and these five teams were jointly qualified to attend the IJCAI 2018 main conference in Stockholm in July.

Please pay attention to the wechat public account “AI Front”, (ID: AI-front)

The problem is introduced

The competition aims to “tap more technology and talents to empower the whole marketing ecology”. The competition includes three stages: preliminary, semi-final and final. With the progress of the competition, the difficulty of the questions gradually increases.

The competition of the problem of data are from real business scenarios, as big data marketing platform of alibaba group, ali mother with ali group’s core business data, these data have been used to adopt deep learning, online learning, reinforcement learning and other precise artificial intelligence technology to predict the user purchase intention. However, e-commerce platform is a complex ecosystem, and factors such as user behavior preference, commodity long tail distribution, and hot event marketing will bring great challenges to conversion rate estimation. How to make better use of massive transaction data to efficiently and accurately predict users’ purchase intentions is a technical problem that artificial intelligence and big data need to continue to solve in e-commerce scenarios.

This competition takes Alibaba’s e-commerce advertising as the research object, provides massive real transaction data of the platform, and contestants build a prediction model to predict users’ purchase intention through artificial intelligence technology. That is, the probability of advertising purchase behavior prediction (pCVR) given the relevant information of user, AD, query, context, shop and so on, which is formally defined as:

PCVR = P (conversion = 1 | query, the user, AD, context, shop). Combined with the business scenarios and different traffic characteristics of Taobao platform, two types of challenges are defined: “daily conversion rate estimation” and “conversion rate estimation on special dates”.

In the preliminary round, the data of the first seven days were provided to predict the eighth day, while in the semi-final, the data of the eighth day in the morning was provided to predict the afternoon. The corresponding data volume also increased. In the preliminary round, the training data set was 480,000, and the test data set was 60,000. At the rematch, the training data set was 10 million and the test data set was 1.73 million.

Answer the questions in the final

Eight teams made it through to the final round. Members of these teams are from universities, research institutes or technology companies with both strength and experience.

Champion DOG Team: Simplifying code and migrating learning

The competition was fierce, and the winner was the one-man DOG team made up of hua Zhixiang from the industry.

Hua Zhixiang explained the solution to the problem before the preliminary and semi-final. The data of the first seven days were relatively stable, while there was a big fluctuation on the eighth day. Therefore, the data of the morning and afternoon of the eighth day were predicted simultaneously according to the data of the first to seventh days. This is actually the use of the method of transfer learning to predict the scenario of the promotion scenario under the common scenario. Then combined with the sales training model in the morning of the eighth day of promotion to obtain the results, that is, to predict the data in the afternoon of that day. And this whole model only uses Lightgbm to do.

Four are used in terms of model characteristics. Statistical features include the number of items clicked by users, the last search time, the maximum page number, the average search hour, interaction time, etc. The time difference feature mainly considers the time between two interactions, including the time between user, item interaction, item category, item brand item_brand_id and so on. These factors are expressed as the number of interactions between user and product in the ranking feature.

In terms of characterization features, word bags are used to calculate the existence of property, the proportion of all users’ views on features, and the average proportion of users’ items viewed on these features, and these features are used for modeling to achieve accurate prediction of user behavior. In terms of the core code, players only need one page to successfully display, and the simplicity of the code also helps them win.

The judges praised the DOG team for its “impressive use of transfer learning, simple, effective and clear approach”.

Blue whale incense burning team: model data complete and comprehensive

The runner-up in the finals was the Blue Whale Incense Team of BRYAN, Slippery, and Lee Kun-jin from the industry.

The speaker firstly analyzed the topic, focusing on business scenarios, search and transformation estimation. In terms of data analysis, the overall trend of daily samples and transaction number, daily transaction rate and conversion rate per hour was estimated; Data types are divided and missing data are filled by means of average and mode. In terms of user analysis, low-frequency appeals can be found through the number of user clicks, and long-tail distribution can be found through the number of purchases. Combining the two, users with immediate interests and clear goals can be found. Then we can analyze the data in depth to find hidden information, and finally plot the daily click trends.

In order to improve the efficiency of the optimization algorithm, reduce the luck component of online results, and avoid the problem that the algorithm relies too much on online data sets, the offline test method is adopted, and the optimization of online verification is significantly improved offline. In terms of model design, the team designed three models including master model, global data model and time information model to achieve accurate prediction.

In terms of features, The blue Whale Shaoxiang divided the feature groups into three categories. One category of original features included basic features. The two simple features include conversion rate, ranking, proportion, trend, etc. The three complex features include Query interaction feature, user interaction feature, competition feature, business feature and so on. After offline testing with a variety of features, the improvement of prediction accuracy by different feature groups is found to find important features. In terms of model fusion, LightGBM model is fused by a simple weighted fusion method.

The evaluation of the blue Whale incense team was “impressive speech, the whole model data and other aspects are very comprehensive and complete, and also achieved very good results”.

Third place lay team: deep understanding of business

Chen Bocheng of Zhejiang University of Technology, Robin Li of Central South University and Wu Hao of Tianjin University took the third place.

Lying team first analyzed the question, and they thought that the difficulty of the question was, on the one hand, how to find features suitable for promotion or mutation expression in normal traffic data; On the other hand how to choose the model, how to find the lightweight framework to be landed in the industry as soon as possible. After analysis, it is found that the last day is the promotion day, so the modeling direction can be divided into two kinds, one is for the general idea of User and interaction modeling, the other is for the exploration of changes in promotion modeling.

Therefore, the lying team proposed four sets of training programs, including Only 7 for change, all-day for full data, sample-all for full data, and all-to-7 for full statistical feature extraction. The problems are verified respectively.

In the aspect of feature engineering, the lying team firstly classified the basic features, then removed the columns with little value change, and then removed the columns with too many missing values. In the aspect of user characteristics, user preference behavior is determined by basic data. Then through the time difference, the user’s recent behavior. Then the crowd conditions attracted by the store and the crowd conditions attracted by the advertisement were painted.

With the help of these features, the data of the first 7 days can be used to predict the probability value of the eighth day, and the matching degree between Item_property_list and predict_category_property can be calculated. Considering the actual scenario of the contest, when the user searches, The predicted categories of the query term have a better chance of matching the search term and the user has a better chance of purchasing.

In terms of model selection, neural network is selected so that ID feature can be added into continuous feature Embedding of cross layer. It is found that the changing features should be considered in the promotion period, the reasonable feature extraction framework is the way to win, and the fusion of multiple models can improve the accuracy more.

The evaluation of the judges is that the team has a very complete system thinking, a deep understanding of the business and a thorough business analysis.

Innovation Award: No internship how to find a job ah team and strong East team

In the original plan, there would be two special award teams in the final, but the performance of the two teams banned from practicing how to find a job and the Strong East team made the judges temporarily decide to change the award to innovation award, to encourage the two teams with innovative ideas in the process.

The team is composed of Zhuang Xiaomin of the Chinese Academy of Sciences, Zhang Weimin of the Institute of Computing Science of the Chinese Academy of Sciences and Li Haoyang of the Hong Kong University of Science and Technology. First of all, they divide the data into time intervals and make effective use of historical data with different characteristics to analyze user behavior with statistical characteristics. In this way, two behavioral characteristics of users are found. First, users with sparse data appear only in one day; second, users with less data have a higher conversion rate.

Therefore, the users with little data are distinguished by structural features, which is convenient for the overall judgment of the model. However, for users with more data, construction features are directly used to express user behavior. Time features include hourly hot spots, trend features, Windows and many other strong features. The special feature of Embedding is that the clicked items of the same user are sorted in chronological order. As a doc, such a DOC actually represents the clicking sequence of the user. The context of each “word” item in Doc refers to the item that the user cares about that is similar to that item. Similarly, the characteristics of Shop and User can be calculated; When Embedding feature is tested on several models, the Embedding feature increases by 3+ to 6+. In addition, because the more quality pages referred to the page, it is high quality probability is greater, the user click PageRank value is also important.

In the aspect of model algorithm, the combined feature model requires the feature splicing of different single models to perform certain screening. For the Kfold-Average model, the single model does 10fold, 9 fold training, prediction test set, and 10fold AVG. The variance can be effectively reduced, and the result is improved and more stable. The final model is shown above.

The judges’ evaluation of the team on how to find a job is that “the team has its own characteristics, fully mining user serialization behavior information and user expression, and improving the effect”.

The Qiangdong team is composed of Li Qiang from Jilin University, Shen Dongdong from Shandong University and Jiang Haoran from Central South University. They first analyzed the challenge and found that 98 percent of all shopping interactions were less than 10 clicks. Make some feature points, such as first click, total number of clicks, favorite product, etc. But what’s really useful for this contest are the features of deep learning. There are three main ones: single-feature encoding, sequential feature encoding after bucket, multi-feature pad after attention weighting.

Multi-features can be input to the Embedding layer through pad. The team uses DIN network as reference to create the attention layer to weight multi-features. Most CTR models in the field of deep learning optimize the second-order combination of features. Lr layer and FM layer are used in the first and second order, and FM layer is optimized to linearity. Cin layer or MVM layer can also be used for multi-order features. Considering the high complexity of CIN layer, simple MVM layer is used to combine infinite order features.

The nonlinear relationship between features can be obtained in deep layer, and the input is emBdding for discrete features, EMBDding after bucket for continuous features, and weighted vector after multi-feature attention. Encoding LGB leaves into deep layer can obtain more explicit combination information of features.

It is worth noting that matrix operations should be used as much as possible when debugging DL models; Using one-dimensional dropout for embedding reduces the risk of overfitting. NN features have a large randomness, and the model trained each time has randomness, which can be averaged multiple times. Using Hashtrick greatly reduces resource consumption. The judges praised the team as “end-to-end learning using deep learning methods, involving industrial models, which stood out among all the contestants”.

The battle of marketing algorithms is over. Simplify code and transfer learning are the biggest winners

Related Posts

Computer Network Architecture of Computer Networks (1)

Neural network learning notes – back propagation

Image segmentation based on matlab key pixels FLICM image segmentation