! [](https://cdn.jsdelivr.net/gh/BlackSpaceGZY/cdn/img/tf_0.png)

preface

First post in Denver. The article promoted in this issue is my own GitHub open source project: Recommended System with TF2.0. Use Tensorflow2.0 to reproduce classic recommended papers. Welcome star.

Address: github.com/BlackSpaceG…

Wechat article address: portal

Establish why

The open source project Recommended System with TF2.0 is mainly to reproduce some of the Recommended System and CTR prediction papers that have been read. There are three reasons for the establishment:

  1. There seems to be a great gap between theory and practice, especially between academia and industry;
  2. Better understand the core content of the paper, enhance their engineering ability;
  3. The open source code given by many papers is TF1.x, so they want to reproduce it with simpler TF2.0. Of course, I have seen some well-known open source projects, such as DeepCTR, but at my current level, it is only suitable for reference.

Project characteristics

Project Features:

  • Tf2.0-cpu was used for reproduction;
  • Each model is independent of each other and there is no dependence;
  • The model is basically constructed according to the paper, and the experiment tries to use the public data set given by the paper. If github code is given in the paper, I will refer to it.
  • There is a detailed description of the experimental data set;
  • Code source file parameters, function naming conventions, and with standard comments;
  • Each model is explained in a special code document (.md file) or otherwise;

Current reproduction model

The current recurrence models are (sorted by recurrence time) :

  • NCF
  • DIN
  • Wide&Deep
  • DCN
  • PNN
  • Deep Crossing
  • DeepFM continues to update

The specific content

The details of the open source project are as follows:

1. Introduction of data sets

This paper briefly introduces the data set used in the process of reproduction, analyzes its characteristics and processes the data set.

Such as:

Criteo

The Criteo Advertising dataset is a classic data set used to predict click-through rates for ads. In 2014, Criteo sponsored the Display Advertising Challenge. But it was so long ago that Kaggle no longer provided the data set. There are three ways to obtain data sets or their samples:

  • Criteo_sample. TXT: included in DeepCTR to test whether the model is correct, but the amount of data is too small.
  • Kaggle Criteo: Training set (10.38g), test set (1.35g); (Most of the experiments use this data set.)
  • Criteo 1TB: complete log data sets can be downloaded as required; Introduction and processing of Criteo data set: Portal [https://github.com/BlackSpaceGZY/Recommended-System-with-TF2.0/blob/master/Dataset%20Introduction.md#3-criteo]

2. Paper model

This part is mainly divided into:

  • Model structure drawing;
  • Experimental data set;
  • Code parsing: a simple documentation of the current model’s open source code ;
  • Source code: if available;
  • Original address;
  • Original notes: corresponding notes of the paper;

Such as:

Deep Interest Network for click-through Rate Prediction(DIN)

! [](https://cdn.jsdelivr.net/gh/BlackSpaceGZY/cdn/img/tf_2.png)

Data set: Electronics subset of Amazon data set. Code parsing:

Source code address: github.com/zhougr1993/…

Original address: arxiv.org/pdf/1706.06…

Original notes: mp.weixin.qq.com/s/uIs_Fpeow…

Next task

Since one of my classmates asked me last night if I would reproduce a certain article, I’m not sure. Therefore, the next reproduction task can be planned as follows: it will reproduce the papers that have been read (in fact, the previous reproduction papers are read twice) : FNN, xDeepFM, AFM, etc.; New paper introduction and partial paper reappearance in 2020 will be added: DMR, etc.;

conclusion

If you have any questions about the project, you can leave a message on the Issue or add me as a friend through the official account. Finally, post the address again: github.com/BlackSpaceG…