CTR estimation is always a core problem in computing advertising and recommendation systems. It is a hot research issue in both industry and academia. In recent years, a number of relevant algorithm contests have been held successively. In this paper, we introduce deepCTR-Torch, a deep learning CTR prediction algorithm library written by PyTorch. It is simple, easy to use, modular and extensible. It is very suitable for beginners to learn quickly.
(Shen Weichen is an algorithm engineer at Alibaba.)
Click through rate estimation problem
Clickthrough rate forecast problem usually formalized description for a given user, material, under the condition of the context, calculating the probability of users to click on the material: pCTR = p (click = 1 | user, item, the context).
In simple terms, pCTR is used to calculate the expected revenue of an advertisement in the advertising business and to determine a sorted list of candidate materials in the recommendation business.
DeepCTR-Torch
People improve their performance by constructing effective combination features and using complex models to learn patterns in the data. Factor-based approaches can learn feature interactions in the form of vector products and generalize to combinations that do not occur.
With the great development of deep neural network in several fields, researchers have also proposed several decomposition models based on deep learning in recent years to simultaneously learn low-order and high-order feature interactions, such as:
PNN,Wide&Deep,DeepFM,Attentional FM,Neural FM,DCN,xDeepFM,AutoInt,FiBiNET
DIN,DIEN,DSIN, etc., based on user history behavior sequence modeling.
For students who are new to this aspect, they may not know much about the details of these methods. Although there are many introductions on the Internet, there is no uniform form of the code, and it is very inconvenient when they want to migrate to their own data sets for experiments. This article introduces a deep learning based CTR model package deepCTR-PyTorch implemented using PyTorch. It is easy to use and learn.
Deepctr-pytorch is a simple and easy to use, modular and extensible CTR model package based on deep learning. In addition to the mainstream models of recent years, there are many core component layers that you can use to easily build your own custom models.
You simply use these complex models to perform training and prediction tasks through model.fit() and model.predict(), and specify whether to run on a CPU or gpu through the device parameter in the model initialization list.
Installation and use
- The installation
pip install -U deepctr-torch
Copy the code
- Using the example
Here is a simple example to tell you how to quickly apply a CTR model based on deep learning. The code address is:
Github.com/shenweichen…
The Criteo Display Ads Dataset is a CTR estimation contest dataset on Kaggle. It contains 13 numerical features I1-I13 and 26 category features C1-C26.
# -*- coding: utf- 8 --*- # Use pandas to read the data described above and simply fill in the missing valuesimport pandas as pd
from sklearn.metrics import log_loss, roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from deepctr_torch.models import *
from deepctr_torch.inputs import SparseFeat, DenseFeat, get_fixlen_feature_names
importTorch # Uses pandas to read the data described above and simply fill in the missing values data = pd.read_csv('./criteo_sample.txt'# the data above is in: HTTPS://github.com/shenweichen/DeepCTR-Torch/blob/master/examples/criteo_sample.txt
sparse_features = ['C' + str(i) for i in range(1.27)]
dense_features = ['I' + str(i) for i in range(1.14)]
data[sparse_features] = data[sparse_features].fillna('1', )
data[dense_features] = data[dense_features].fillna(0, )
target = ['label'For category features, we use LabelEncoder recoding (or hash coding), and for numerical features, we use MinMaxScaler compression to0~1In between.for feat in sparse_features:
lbe = LabelEncoder()
data[feat] = lbe.fit_transform(data[feat])
mms = MinMaxScaler(feature_range=(0.1) data[dense_features] = ms.fit_transform(data[dense_features]) So we need to tell the model how many embbedding vectors there are for each feature group, which we count using the nUnique () method for pandas. fixlen_feature_columns = [SparseFeat(feat, data[feat].nunique())for feat in sparse_features] + [DenseFeat(feat, 1.)forfeat in dense_features] dnn_feature_columns = fixlen_feature_columns linear_feature_columns = fixlen_feature_columns Fixlen_feature_names = get_fixlen_feature_names(linear_feature_columns + dnn_feature_columns) Train, test = train_test_split(data, test_size=0.2)
train_model_input = [train[name] for name in fixlen_feature_names]
test_model_input = [test[name] forName in fixlen_feature_names] # Check whether gpu device = can be used'cpu'
use_cuda = True
if use_cuda and torch.cuda.is_available():
print('cuda ready... ')
device = 'cuda:0'Initialize the model, Model = DeepFM(Linear_Feature_columns = linear_Feature_columns, DNN_Feature_columns = DNN_Feature_columns, task='binary',
l2_reg_embedding=1e-5, device=device)
model.compile("adagrad"."binary_crossentropy",
metrics=["binary_crossentropy"."auc"],)
model.fit(train_model_input, train[target].values,
batch_size=256, epochs=10, validation_split=0.2, verbose=2)
pred_ans = model.predict(test_model_input, 256)
print("")
print("test LogLoss", round(log_loss(test[target].values, pred_ans), 4))
print("test AUC", round(roc_auc_score(test[target].values, pred_ans), 4))
Copy the code
The relevant data
- Deepctr-torch code home page
Github.com/shenweichen…
-
DeepCTR – the Torch documents:
Deepctr – torch. Readthedocs. IO/en/latest/I…
-
DeepCTR(Tensorflow)
Github.com/shenweichen…
-
Tensorflow DeepCTR
Deepctr – doc. Readthedocs. IO/en/latest/I…
Author’s brief introduction
Shen Weichen, Master of Computer science, Zhejiang University, Alibaba Group algorithm engineer \
Shen weichen once participated in the preparation of notes for DeepLearning. Ai.
Making home page:
github.com/shenweichen
Zhihu column shallow dream study notes
zhuanlan.zhihu.com/weichennote
Email [email protected]
Site introduction ↓↓↓
“Machine Learning Beginners” is a personal public account to help artificial intelligence enthusiasts get started (founder: Huang Haiguang)
Beginners on the road to entry, the most need is “help”, rather than “icing on the cake”.
ID: 92416895\
Currently, the planet of Knowledge in the direction of machine learning ranks no. 1.
Past wonderful review \
-
All those years of academic philanthropy. – You’re not alone
-
Conscience recommendation: Introduction to machine learning information summary and learning recommendations \
-
Github Image download by Dr Hoi Kwong (Machine learning and Deep Learning Notes and Resources)
-
Machine Learning Cheat Sheet – (Understanding machine learning like reciting TOEFL Vocabulary) \
-
Introduction to Deep Learning – Python Deep Learning, annotated version of the original code in Chinese and ebook
-
Machine learning – “Statistical learning methods” python code implementation, ebook and courseware \
-
Blockbuster | complete AI learning course, the most detailed resources arrangement! \
-
Word2vec
-
Machine learning related mathematical materials download