Differences between DeepFM and other related models
- FNN relies on the pre-trained FM layer
- PNN outputs FM results to DNN instead of merging them with Wide&Deep
- Wide&Deep does not use FM, manual design features, time-consuming and laborious
DeepFM highlights
FM layer formula
Main idea: introduce the second order cross feature to improve the effect
Background: SVM introduces kernel function and assigns a weight wiJW_ {iJ} WIJ to each feature combination to learn. Formula:
Obviously wiJW_ {ij} Wij of magnitude N2N ^2n2.
Therefore, it is decoupled with the idea of matrix decomposition, an m-dimensional implicit vector V is assigned to each feature, and the fitting weight of
is calculated. In the actual scenario, m<
The formula
The formula of FM can be optimized from o(n^2) to O (n)
The core is steps two to three,
- The dot product operation, a multiplicative associative law, (VIf1vjF1 + VIf2vjf2)xixj== VIf1XiVJF1xj + VIf2XivJF2xj (v_{if1}v_{jF1} + v_{if2}v_{jF2}) X_IX_j == V_ {IF1} X_IV_ {jF1} X_j + v_{if2}x_iv_{jf2}x_j(vif1vjf1+vif2vjf2)xixj==vif1xivjf1xj+vif2xivjf2xj
- We can take one point of the implicit vector and sum it up. For this point, the difference between ab plus AA plus BB plus ba is equal to a plus b times a plus b. So we can go from 2 to 3.
Here’s a picture:
Shared Embdding
In DeepFM, the DNN and FM layers share Embdding weights. Therefore, no additional initial pre-training is required.
Implementation details
In concrete implementation, all discrete features should be split into multiple features according to categories. For example, gender has two categories of male and female, and it should be split into two categories of male and female. And they all have a value of 1. (according to?