2.4.3 Estimation and approximation errors

It is assumed that the difference between the error in H ∈Hh ∈H and the Bayesian error can be decomposed into:


R ( h ) R = ( R ( h ) R ( h ) ) estimated + ( R ( h ) R ) approximation . ( 2.25 ) R (h) – R ^ * = \ underbrace {(R (h) – R (h ^ *))} _ {estimated} + \ underbrace {(R (h ^ *) – R ^ *)} _ {approximation}, (2.25)

Where H ∗ H ^*h∗ is the hypothesis with minimum error in HHH, or the best hypothesis of the same kind. The second term is called the approximation error because it measures the extent to which a Bayesian error can be approximated using HHH. It is a property of the hypothesis set HHH and is a measure of its richness. Because the underlying distribution DDD is usually unknown, the approximation error cannot be accessed. It is difficult to estimate the approximate error even under various noise assumptions. The first term is the estimation error, which depends on the chosen hypothesis HHH. It measures the quality of the hypothesis HHH relative to the best-in-class hypothesis. The definition of agnostic PAC learning is also based on estimation error. The estimation error of algorithm AAA, that is, the hypothesis HSH_SHS returned after training on the sample SSS, can sometimes be bounded in terms of generalization error. For example, let hSERMh^{ERM}_ShSERM represent the hypothesis returned by the empirical risk minimization algorithm, that is, the algorithm that returns the hypothesis hSERMh^{ERM}_ShSERM with the smallest empirical error. Then theorem 2.22.22.2 or any other suph ∈ ∣ R H (H) – R ^ (H) ∣ sup_ \ {H in H} | R (H) (H) – \ widehat R | suph ∣ ∈ H R – R (H) (H) of ∣ generalization constraints, can be used for constraint of empirical risk minimization algorithm estimation error. In fact, the rewrite estimate error causes R^(hSERM)\widehat R(h^{ERM}_S)R (hSERM) to appear, And use according to the definition of algorithm of R ^ (hSERM) or less R ^ (h ∗) \ widehat R (h ^ {ERM} _S) \ leq \ widehat R (h ^ *) R (hSERM) R (h ∗) or less, we can write


R ( h S E R M ) R ( h ) = R ( h S E R M ) R ^ ( h S E R M ) + R ^ ( h S E R M ) R ( h ) Or less R ( h S E R M ) R ^ ( h S E R M ) + R ^ ( h ) R ( h ) Or less 2 s u p h H R ( h ) R ^ ( h ) . ( 2.26 ) \begin{aligned} R(h^{ERM}_S)-R(h^*)& =R(h^{ERM}_S)-\widehat R(h^{ERM}_S)+\widehat R(h^{ERM}_S)-R(h^*)\\ & \leq R (h ^ {ERM} _S) – \ widehat R (h ^ {ERM} _S) + \ widehat R (h ^ *) – R (h ^ *) \ \ & 2 \ \ leq underset \ {h in h} {who} (h) – \ | R widehat R (h) |. (2.26) \end{aligned}

3.3.3. When HHH is a finite hypothesis set, H ∗h^∗h∗ must exist; Otherwise, in this discussion, R(h∗)R(h^∗)R(h∗) can be replaced by INF_ {h\in H}R(h) by INFH ∈HR(h).