1.2.2 Expectation and covariance

One of the most important operations involving probability is finding the weighted average of a function. Probability distribution p (x) p (x) p (x) of some function f (x) f (x) f (x) as the average of f (x) f (x) f (x) expectations, with E [f] [f] [f] E E said. For the discrete distribution, it is given by the following equation


E [ f ] = x p ( x ) f ( x ) (1.33) E = \ [f] sum_xp (x) f (x) \ tag} {1.33

Thus, the mean is weighted by the relative probabilities of different values of XXX. In the case of continuous variables, the expected value is expressed as an integral with respect to the corresponding probability density


E [ f ] = p ( x ) f ( x ) d x (1.34) E = \ [f] int p (x) f (x) dx \ tag} {1.34

In both cases, if we obtain a finite number of NNN points from a probability distribution or probability density, then the expected value can be approximated as a finite sum over these points


E [ f ] 1 N n = 1 N f ( x n ) (1.35) E [f] \ simeq \ frac {1} {N} \ sum_ {N = 1} ^ Nf (x_n) \ tag} {1.35

We will use this result extensively when we discuss sampling methods in Chapter 11. The approximation in (1.35) becomes accurate in the limit N→∞N\rightarrow\inftyN→∞.

Sometimes we will consider the expected value of a function of multiple variables, in which case we can use subscripts to indicate which variables are averaged, for example


E x [ f ( x . y ) ] (1.36) E_x \ [f (x, y)] tag} {1.36

Represents the mean value of the distribution of the function f(x,y)f(x,y) with respect to XXX. Note that Ex[f(x,y)]E_x[f(x,y)]Ex[f(x,y)] will be a function of yyy.

We can also think about the conditional expectation of the conditional distribution, so that


E x [ f ( y ) ] = x p ( x y ) f ( x ) (1.37) E_x = \ [f (y)] sum_xp f (x) (x | y) \ tag} {1.37

Similar to a continuous variable.

F of x, f of x, the variance of f of x is defined as f of x


v a r [ f ] = E [ ( f ( x ) E [ f ( x ) ] ) 2 ] (1.38) Var [f] = [E (f (x) – E/f (x)) ^ 2] \ tag} {1.38

And provides a measured value, that is, f (x) f (x) f (x) in the average [f (x)] E E E [f (x)] [f (x)] how many changes around them. So we can see that the variance is also expressed in terms of the expected value of f(x), f(x), f(x), and f(x)2f(x)^2f(x)2


v a r [ f ] = E [ f ( x ) 2 ] E [ f ( x ) ] 2 (1.39) Var [f] [f (x) ^ 2] = E – E ^ 2 \ [f (x)] tag} {1.39

In particular, we can consider the variance of the variable XXX itself, which is determined by


v a r [ x ] = E [ x 2 ] E [ x ] 2 (1.40) Var (x) = E (x ^ 2] – [x] E ^ 2 \ tag} {1.40

For the two random variables XXX and YYY, the definition of covariance is as follows:


c o v [ x . y ] = E x . y [ { x E [ x ] } { y E [ y ] } ] = E x . y [ x y ] E [ x ] E [ y ] (1.41) Cov (x, y) = E_ {x, y} [\ {x – E [x] \} \ {y – E \} [y]] = E_ {x, y} [y] – [x] E E \ [y] tag} {1.41

Indicates the degree to which XXX and YYY change together. If XXX and YYy are independent, their covariance disappears.

In the case of two vectors of the random variables XXX and YYy, the covariance is a matrix


c o v [ x . y ] = E x . y [ { x E [ x ] } { y T E [ y T ] } ] = E x . y [ x y T ] E [ x ] E [ y T ] (1.42) Cov (x, y) = E_ {x, y} [\ {x – E [x] \} \ {y ^ T – E [y] ^ T \}] = E_ {x, y} [xy ^ T] – [x] E E \ [y ^ T] tag} {1.42

If we consider the covariance between the components of the vector XXX, then we use a slightly simpler notation cov[x]≡ Cov [x,x]cov[x]\equiv cov[x,x] Cov [x]≡cov[x,x].