2. Loss function¶

$w=[3,2]^T$; $w=[3,2]^T$; $w=[3,2]^T$; $w=[3,2]^T$; $y$= $f(x)$= $f(x)$

$m $said sample size here, in this case $m = 100 $, $x_i y_i $says the first $I $samples, $X \ in R ^ ^ * {m \ times (n + 1)}, Y \ in R ^ {1} m \ times $, loss function $L (w) $is essentially about $w $function, The optimal solution of $w$can be obtained by solving the minimum $L(w)$:

${frac{dL}{dw}=0$$${frac{dL}{dw}=0$$$$$$${frac{dL}{dw}=0$$$

$\frac{dL}{dw}=0$ $w ^ * = (*} {X ^ ^ TX ^ *) ^ {1}} {X ^ * ^ TY $, actual scenario data can not meet the $} {X ^ * ^ TX $is full rank (such as $m < n $, $w $solution of countless species), so can not directly inversion, we can consider to solve in the following way:? {X^*}^+=\lim_{\alpha\rightarrow0}({X^*}^TX^*+\alpha I)^{-1}{X^*}^T ?

The above formula is the definition of Moore-Penrose pseudo-inverse, but the actual solution is more in the way of SVD:

$U,D,V$= $X^*$= $X^*$= $X^*$= $X^*$= $X^*$= $X^*$= $X^*$

(1) when $w $a solution, $w ^ * = *} {X ^ ^ + Y $is all solutions in Euclidean distance $| | w | | _2 $a minimum;

(2) when $w $no solution is obtained by inverse pseudo w ^ * $is to make $$$and $X w ^ ^ * * Y $Euclidean distance $| | X ^ ^ * w * – Y | | _2 $is minimal

But more generally, we can update $w$by stochastic gradient descent (SGD). Firstly, we randomly initialize $w$, and then use the following iteration formula to update $w$iteratively: w:=w-\eta\frac{dL}{dw} ?

3. Model training¶

At present, we have derived the updating formula of $w$, and the coding training process is as follows: