If the weight of the deep learning model is initialized too small, then the signal will gradually shrink as it passes between each layer and it is difficult to have an effect.
If the weights are initialized too large, the signal will gradually amplify as it passes through each layer leading to divergence and failure
The Xavier initializer is designed to give a modest weight when initializing deep learning networks.
The derivation belongs to the content of uniform distribution