2011 Real-time Bidding Algorithms for Performance-Based Display Ad Allocation
The article is not well understood, so there are many omissions in the notes.
background
This paper considers the performance-based distribution of display advertising. The goal is to match advertisers (demand side) with the number of AD displays (supply side) in order to maximize total revenue (aka revenue) from the publisher’s perspective (media, I understand) while meeting constraints imposed primarily by advertising series budgets and supply lists. The main contributions are as follows
- A real-time bidding algorithm is proposed to achieve fine-grained exposure valuation (e.g. using real-time conversion data to locate users) and adjust value-based bidding based on real-time constraint snapshots (e.g. budget consumption levels).
- It is proved theoretically that the simple real-time bidding algorithm with the optimal solution of the dual problem as the input under the linear programming (LP) primal duality formula is indeed an online solution of the primal duality problem.
- Develop and validate two real-time bid adjustment methods to adapt to the non-stationary nature of the market.
There are several reasons why we should study online bidding rather than offline bidding.
- Offline global optimization can only be addressed at a significant level of exposure granularity. For example, in the current performance optimization formula for a large display advertising network, the number of displays is assigned at the AD series placement level, with a decision variable of about 1M and a constraint of about 0.5m. For valuation purposes, presentations within each node of this granularity are considered homogeneous, and therefore more fine-grained presentation data cannot be leveraged. Some of these display-level opportunities are differentiated, such as the historical behavior of users (only the AD space is considered, not the users who clicked on the AD in real time).
- Even if some limitations are inherent, such as budget constraints at the AD level and traffic at the display location level, offline optimization can be tricky as the number of advertisers and publishers increases.
- Since offline optimization can only create static allocation schemes, there is no natural way to adapt to market dynamics. Especially when the distribution of winning bids is changed and the result is that the campaign target is too high or too low, the offline algorithm has no corresponding adjustment mechanism.
Several related concepts in this paper
The exposure value
The price of an advertiser’s (competitor’s) advertisement exposure is calculated by CPC: eCPI=CTR×CPCeCPI =CTR \times CPCeCPI=CTR×CPC, i.e. click rate x click value.
Click rate
That is, the click through rate of the advertisement released by the advertiser when it is exposed. The calculation method is as follows: CTR = p (click ∣ campaign, impression) CTR = p (click | campaign, impression) CTR = p (click ∣ campaign, impression).
Real-time bid adjustment
The bid can be adjusted according to the current competition level and constraints. The adjustment mode is as follows: bid=eCPI−αbid =eCPI – \alphabid=eCPI−α. When the bidding landscape becomes competitive, turn down alphaalphaalpha, or even turn alpha \alpha alpha negative, in order to gain exposure.
Problems in modeling
The purpose of online bidding is to maximize the total return, considering the constraints are:
- Demand-side constraints (such as budget constraints) are given in this article based on the Impression Delivery goal
- Supplier constraints, one exposure can only be assigned to one advertiser
The problem is described as follows
- I represents the I out of n exposures and j represents the J out of m advertisers
- Pijp_ {iJ} PIJ represents the CTR value of the JTH advertiser when he obtained the I exposure, qJQ_jQj represents the CPC bid of the JTH advertiser, so the corresponding eCPI is expressed as VIj = PIJQJv_ {ij} = P_ {iJ}q_jvij= PIjqj
- Gjg_jgj is the Impression Delivery goal of J advertiser
- Xijx_ {ij}xij is the decision variable, xij=1x_{ij} = 1XIj =1 means to allocate the I exposure to advertiser J, xij=0x_{ij} = 0xij=0 means not to allocate the I exposure to advertiser J.
The original expression of the optimization problem is as follows, and the solution parameters are m×nm\times nm×n.
The duality problem is expressed as follows, and the solving parameters are M +nm+nm+ N.
S.t. ∀ I, j, alpha j + beta I acuity vij \ forall I, j \ alpha_ {j} + \ beta_ {I} \ geq v_ {I} j ∀ I, j, alpha j + beta I vij or higher
An intuitive explanation of variables in duality problems
- αj\alpha_jαj the economic value of obtaining additional target exposure for the advertiser J (for example, by increasing the budget, also understood in the text as minimum profit)
- β I \beta_iβ I The economic value of obtaining additional exposure I for the media (e.g., attracting more visitors)
After modeling from above, the question is
- How to derive the optimal solution of the original problem from the optimal solution of the dual in online mode
- How to explain the non-stationary nature of exposure number arrival distribution
Real-time bidding algorithm
The online bidding algorithm is as follows, considering that a real-time bid is required each time exposure I arrives, that is, algorithm XIj,∀ JX_ {ij},\forall JXIJ,∀j. In the original problem, m variables XIj,∀jx_{ij},\forall JXIj,∀ J, the variables satisfy a constraint ∑ JXIj ≤1\sum_j X_ {ij} \leq 1∑ JXIj ≤1. In the dual problem, a variable β I \beta_iβ I is solved, The variables satisfy m constraints αj+β I ≥ VIj ∀ J \alpha_j + \ beta_I \geq V_ {iJ} \forall J αj+β I ≥ VIj ∀j.
The online algorithm uses the complementary relaxation theorem to assign variables to maintain the offline optimal state.
- For exposure I and advertiser J with a value of VIJV_ {ij}vij and thus a bid of VIj −α JV_ {ij} – \alpha_jvij−αj, the advertiser with the highest bid gets exposure, and the advertiser with goal-Achieved Campaigns quits the bid
- β I \beta_iβ I is used to record the economic value of exposure I and is not important to the auction
- Alpha j = UpdateAlpha (alpha j) \ alpha_j = UpdateAlpha (\ alpha_j) alpha j = UpdateAlpha (alpha j) according to the target to complete the degree of environment and offer to adjust alpha j \ alpha_j alpha j. Under the assumption of stationary impression arrival, it is reduced to an identity function.
- From the perspective of bidding, αj\alpha_jαj indicates that the advertiser needs the minimum profit, so it can be understood that the maximum bid can be vij−αjv_{ij} -\ alpha_jvij−αj; β I \beta_iβ I shall be proportional to the base price required by the media selling exposure I
The off-line optimality of the basic online bidding algorithm is established under the assumption of stationary α
Theorem 2 in Text
With the algorithm above, adjusting alpha J,∀ J \alpha_j, \ Forall J αj,∀ J,∀ J, uses the following first-price auction design to ensure offline optimization given the best bid.
- For each exposure I, advertiser J bid vij−α Jv_ {ij} -\ alpha_jvij−αj
- Only advertisers can participate in bidding, which meet the conditions: ∑ I ‘jxi’ j < gj \ sum_ {x_ I ^ {‘} j} {I j ^ {‘}} < g_j ∑ I ‘jxi’ j < gj
- The exposure assignment is only assigned to the highest bid and the highest bid is positive, vij− AJ >0v_{ij} -a_j > 0VIj − AJ >0
Demonstrate the above separately (reference 1 is referred to here)
- If the highest bid vij ∗ – alpha j ∗ > 0 v_ {ij ^ {*}} – \ alpha_ {j ^ {*}} > 0 vij ∗ – alpha j ∗ > 0, exposure, I will be assigned. If there is no allocation, then ∑jxij<1\sum_j x_{ij} <1 ∑jxij<1, according to the linear relaxation theorem, β I =0\beta_i =0 β I =0, By the dual constraints of alpha j ∗ + beta I acuity vij ∗ \ alpha_ {j ^ {*}} + \ beta_i \ geq v_ {ij ^ {*}} alpha j ∗ + beta I acuity vij ∗, Alpha j ∗ acuity vij ∗ \ alpha_ {j ^ {*}} \ geq v_ {ij ^ {*}} alpha j ∗ acuity vij ∗ namely alpha j ∗ – vij ∗ acuity 0 \ alpha_ {j ^ {*}} – v_ {ij ^ {*}} \ geq 0 alpha j ∗ – vij ∗ acuity 0, And vij ∗ – alpha j ∗ > 0 v_ {ij ^ {*}} – \ alpha_ {j ^ {*}} > 0 vij ∗ – alpha j ∗ > 0
- Exposure I will be assigned to advertisers j ∗ j ^ {*} j ∗, if assigned to advertisers’ j j ^ {‘} ‘j, xij’ = 1 x_ {ij ^ {‘}} = 1 xij ‘= 1. According to the complementary slack theorem, get alpha beta I = j ‘+ vij’ \ alpha_ {j ^ {‘}} + \ beta_i = v_ {ij ^ {‘}} alpha beta I = j ‘+ vij’. If J ∗≠ J ‘j^{*}\neq j^{‘}j∗=j’, There are vij ∗ – alpha j ∗ > vij ‘- alpha beta iv_ j’ = {ij ^ {*}} – \ alpha_ {j ^ {*}} > v_ {ij ^ {‘}} – \ alpha_ {j ^ {‘}} = \ beta_ivij ∗ – alpha j ∗ > vij ‘- alpha j’ = beta I, Vij ∗ > alpha beta iv_ j ∗ + {ij ^ {*}} > \ alpha_ {j ^ {*}} + \ beta_ivij ∗ > alpha j ∗ + beta I, And constraint alpha j ∗ + beta I acuity vij ∗ \ alpha_ {j ^ {*}} + \ beta_i \ geq v_ {ij ^ {*}} alpha j ∗ + beta I acuity vij ∗ contradiction
- If the highest bid vij ∗ – alpha j ∗ < 0 v_ {ij ^ {*}} – \ alpha_ {j ^ {*}} < 0 vij ∗ – alpha j ∗ < 0, the exposure of I don’t. When will I assigned to j, xij = 1 x_ {ij} = 1 xij = 1, according to the complementary slack theorem alpha beta I = j ∗ + vij ∗ \ alpha_ {j ^ {*}} + \ beta_i = v_ {ij ^ {*}} alpha beta I = j ∗ + vij ∗, The beta I acuity 0 \ beta_i \ geq 0 beta I acuity 0, all aij vija_ or less {ij} {the ij} \ leq v aij vij, or less and vij ∗ – alpha j ∗ < 0 v_ {ij ^ {*}} – \ alpha_ {j ^ {*}} < 0 vij ∗ – alpha j ∗ < 0
- Vij ∗ – alpha j ∗ = 0 v_ {ij ^ {*}} – \ alpha_ {j ^ {*}} = 0 vij ∗ – alpha j ∗ = 0, there are multiple optimal solutions
Price adjustment
The above algorithm assumes that the distribution of exposure is fixed, that is, the optimal bid adjustment αj\alpha_jαj can be learned with sufficient data. However, the market is dynamic, so the exposure is unstable, αj\alpha_jαj is a time-varying value, and the non-stationary condition violates the complementary relaxation condition, so it is necessary to adjust αj\alpha_jαj. In this paper, we first initialize αj\alpha_jαj with the dual off-line optimal solution derived from the historical data, and then update αj\alpha_jαj online by control theory or statistical methods to fit the supply-side dynamics and demand-side constraint satisfaction levels.
Control theory bid adjustment
PI controller is used for adjustment, and the bid winning probability of expectation and observation at time t is rj(t)r_j(t)rj(t), rj ‘(t)r^{‘}_j(t)rj’ (t), Error for ej (t) = rj (t) – rj ‘e_j (t) (t) = r_j (t) – r ^ {‘} _j ej (t) (t) = rj (t) – rj’ (t), adjust the way as follows: Alpha j (t + 1) please alpha j (t) – k1ej (t) – k2 ∫ 0 tej (tau) d tau \ alpha_ {j} (t + 1) \ leftarrow \ alpha_ {} j (t) – k_ {1} e_ {} j (t) – k_ {2} \ int_ {0} ^ {t} E_ (\ tau) d {j} \ tau alpha j (t + 1) please alpha j (t) – k1ej (t) – k2 ∫ 0 tej (tau) d tau. Where K1K_1K1 is proportional gain, k2k_2K2 is integral gain, t∈[1,2… t]t\in[1, 2… t]t∈[1,2… t]t∈[1,2… t]t is a very small number of time intervals, t is the number of time intervals.
The other is waterlevel-based update, and the adjustment method is as follows: Alpha j (t + 1) please alpha j (t) exp (gamma (xj (t)/gj – 1 / t)), ∀ j \ alpha_ {j} (t + 1) \ leftarrow \ alpha_ {} j (t)/exp/left (\ gamma \ left (x_ (t)/g_ {j} {j} – 1 / T, right), right), and \ forall alpha j j (T + 1) please alpha j (T) exp (gamma (xj (T)/gj – 1 / T)), ∀ j. Where xj(t)x_{j}(t)xj(t) represents the number of exposures obtained by user JJJ in the t interval, γ\gammaγ is used to adjust the speed of response to the error, Error of xj (t)/gj – 1 / Tx_ {} j (t)/g_ {j} – 1 / Txj (t)/gj/t – 1.
Waterlevel -based update has better chain properties, can be written
Statistical method bid adjustment
The method here did not understand, only for record
When bidding for bij = vij – alpha jb_ {ij} = v_ {ij} – \ alpha_jbij = vij – alpha j, winning probability for: p = ∫ bij or less (w) – up bijf (w; Dw = F (theta) bij; Theta) p \ left (w \ leq b_ {I, j} \ right) = \ int_ ^ {- \ infty} {b_ {I, j}} f (w; \theta) d w=F\left(b_{i j} ; , theta, right) p = ∫ bij or less (w) – up bijf (w; Dw = F (theta) bij; Theta). W ~ P (theta) w \ sim \ mathcal {P} (\ theta) w ~ P (theta), theta \ theta theta for the model parameters
The bid adjustment method is: Alpha j (t + 1) please alpha j (t) – gamma (F – 1 (rj (t)) – F – 1 (rj ‘(t))), ∀ j \ alpha_ {j} \ leftarrow (t + 1) \alpha_{j}(t)-\gamma\left(F^{-1}\left(r_{j}(t)\right)-F^{-1}\left(r_{j}^{\prime}(t)\right)\right), \ forall alpha j j (t + 1) please alpha j (t) – gamma (F – 1 (rj (t)) – F – 1 (rj ‘(t))), ∀ j. Where rj(t)r_j(t)rj(t), rj ‘(t)r^{‘}_j(t)rj’ (t) are the expected and observed bid winning probability respectively, and γ\gammaγ is used to adjust the speed of response to error.
A more realistic way to bid
The above bid algorithm is the most basic form of the algorithm and the optimality is established under the assumption of smooth exposure arrival. In the basic formula of LP, constraints are encoded as the Impression delivery goal, and exposures are individually evaluated and assigned. This section establishes a more general optimization goal and constraint form, as shown below
- I represents the ith of n exposure groups, which are considered indistinguishable, i.e., CTR consistent, e.g. same display position
- Gjg_jgj represents the budget of advertiser J
- Hih_ihi Indicates the number of times that media group I can be displayed
- Xijx_ {ij}xij assigns media group I to advertiser J
- Wiw_iwi represents the cost per display (traffic acquisition) for group I, for example, the second price in the Vickrey auction
The original linear programming problem is expressed as follows
The dual problem is expressed as follows
S.t. ∀ I, j, vij alpha j + beta I acuity vij – wi \ forall I, j \ quad v_ {I, j} \ alpha_ {j} + \ beta_ {I} \ geq v_ {I} j – w_ {I} ∀ I, j, vij alpha j + beta I acuity vij – wi
In line with the basic form, the solution is as follows
- Each advertiser J bids based on the exposure from Group I for vij(1−αj)v_{iJ}(1-\alpha_j) VIj (1−αj), and the bid adjustment αj\alpha_jαj is the minimum margin (profit divided by revenue) required for the advertising campaign
- Only active advertisers can participate in the bidding, meet the conditions ∑ IVIjXIj
- The display is allocated only if the highest bid is greater than the cost WIW_iWI, and the cost item WI is constant
The experiment
Three questions were answered
- Given the optimal bidding adjustment αj\alpha_jα J, can the online bidding algorithm obtain the optimal offline returns?
- How is the effect of different bid adjustment methods? Can we approximate the optimal value? What about the control properties?
- What is the significance of selecting the initial value of αj\alpha_jαj?
The specific implementation of the method in the experiment suggested reading the original text, a little more details, not to record here
Evaluation indicators include
Revenue lift = y ‘y = ∑ t, I, jxij’ (t) pijqj ∑ t, I, jcij qj (t) = \ frac {y ^ {\ prime}} {} y = \ frac {\ sum_ {t, I, X_ j} {I} j ^ (t) p_ {\ prime} {I} j q_ {j}} {\ sum_ {t, I, j} c_ {I} j (t) q_ {j}} = yy ‘= ∑ t, I, jcij qj ∑ t (t), I, jxij’ pijqj (t)
Where PIJP_ {ij} PIJ is empirical CTR, qJQ_JQj is bid, so Revenue lift represents the increase of Revenue relative to the actual, and CTR lift represents the increase of click rate relative to the actual data.
Online and offline comparison
Offline is better than online for the following reasons
- A batch exposure is assigned to share the same compound key (t; i; J) to save computation, which may violate the inventory constraint
- The online algorithm only assigns positive winning bids to exposures and may ignore non-profitable exposures. The offline algorithm does not take this constraint adjustment into account, so there is a negative bid
Control bidding strategy comparison
Conclusion there are
- All bid adjustment strategies are close to the off-line optimal, above 90% of the off-line effect
- Model -based adjustment is better than waterlevel-based adjustment
Define the following metrics
- Hourly delivery ratio: a percentage of hourly mentions of potential targets per day
- Hourly reference ratio: indicates the percentage of hourly exposures to daily exposures
The stable control strategy should make the input ratio close to the reference ratio. The experimental results are shown in the figure. Waterlevel-based has excellent stability because it can directly target the reference value measurement error, while Model-based shows greater oscillation. On the other hand, using the static optimal αj\alpha_jαj does not respond to the reference.
The initial
select
Compare the three selection strategies
- The initial αj\alpha_jαj is 0
- The initial αj\alpha_jαj is historically optimal αj\alpha_jαj
- The initial αj\alpha_jαj is off-line optimal αj\alpha_jαj
The experimental results are as follows: when the initial αj\alpha_jαj sampling strategy is different, the effect is similar.
We argue that this may be an artifact of offline simulation, where only offline data pre-filtered by the current production system could be obtained.The actual impression allocation in the data has already gone through constraint enforcement by the current delivery system that produces the data; And as a consequence, the delivery goal for the online bidding simulation will not be constraining enough to make α J
Using different budget constraints, the experimental results are as follows: with budget decreasing, the effect of initializing αj\alpha_jαj with 0 becomes worse, while the effect of initializing αj\alpha_jαj with history or offline remains better.
discuss
At present, I read the most uncomfortable paper ORZ. There are many questions when I read the full text, so I make a record here
-
Statistical method bid adjustment specific how to achieve?
-
In more recent forms of bidding, the granularity of exposure is changed to media group. What is the significance of this?
And the answer seems to be
As one wishes to valuate impressions at a very fine-grained level, the primal will soon become intractable (recall the cream-skimming problem). We propose a dual-based approach as follows.
- Online bidding is not optimal, but it is better than offline. Can we add constraints to get better offline results
The resources
- [Advertising – Mechanism Strategy – Traffic Allocation] A traffic allocation strategy that maximizes revenue