“This is the 15th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”
Introduction to the
The first two chapters introduced the definition of Bayesian network and partial structure learning knowledge of Bayesian network. Today, we continue to introduce the knowledge of Bayesian network structure learning. Let you have a deeper understanding of bayes network.
Bayesian network
Structure learning based on Scoring Search:
The algorithm based on scoring search can be expressed as: when given data set D={D1,D2… ,Dn}, find a structure G* in the search space such that:
Gn is the variable set V={X1,X2,X3… ,Xn} all possible structures in the DAGs search space.
In structure learning based on score search, all possible structures are regarded as the domain of definition, the criterion to measure the quality of a particular structure is regarded as the scoring function, and the process of finding the best structure is regarded as the problem of finding the optimal value of the scoring function on the domain of definition. From an algorithmic perspective, an appropriate search strategy and a scoring function to measure the appropriateness of the DAG to the data need to be determined. Search space can be divided into three types: DAGs space, equivalence class space and node-order space. Although the search space is different, it is divided into two steps: parent set recognition and structure optimization. So far, most studies have focused on structural optimization.
The scoring search-based approach regards structure learning as an optimization problem, and various optimization algorithms can be used for structure learning without considering the learning effect and efficiency. For an algorithm with poor learning efficiency alone, the combination of the two algorithms can be adopted.
Structure Learning based on Hybrid Search:
The structure learning of hybrid search combines the advantages of the first two, first reduces the size of the network structure space by using the advantage of high learning efficiency of statistical analysis, then searches the network structure space by scoring, and finally finds an optimal network structure.
Precise algorithm
The approximate structure learning algorithm can only find the local optimal solution or the network structure which is close to the optimal, but can not guarantee the optimal solution. Due to the factorability of scores for fixed personality ranking of N variables, all daGs compatible with the ranking can be effectively optimized, so the exact learning method of structure learning comes into being. Accurate algorithms can be divided into integer linear Programming (ILP), A* algorithm and Dynamic Programming (DP).
BN structure learning algorithm under incomplete data:
BN structure learning methods all require data – time completeness. Incomplete data will lead to two problems: 1. The scoring function no longer has decomposable form and cannot be searched locally; 2. 2. Some statistical factors do not exist, so the network structure cannot be graded directly.
For incomplete data sets, the most classical BN structure learning Algorithm is Structural Expectation Maximization Algorithm (SEM). The method performs a two-step iterative process: 1. Step E: calculate the statistics of the current data set, and use the expected value or model in the current Bayesian network to complete the data; 2. Step M: BN structure learning algorithm is implemented using the completed data set until convergence.
Bayesian parametric learning
Maximum likelihood estimation
Maximum likelihood estimation is a statistical method based on maximum likelihood principle and is the application of probability theory in statistics. Maximum likelihood estimation provides a method to evaluate the parameters of a model given observation data, i.e., “the model is determined, the parameters are unknown”. By observing the results of several experiments, the maximum likelihood estimation is used to obtain a parameter value that can maximize the probability of sample occurrence.
The goal is to take a given sample result and invert the parameter value that is most likely to result in such an outcome.
Features:
1. Simpler than other estimation methods;
2. Convergence: unbiased or asymptotically unbiased. When the number of samples increases, the convergence will be better;
3. If the hypothetical conditional probability model is correct, better results can usually be obtained. However, if the model is assumed to be biased, it will lead to very poor estimation results.
Bayesian estimation
Start from the prior knowledge and samples of parameters. Unlike ML estimation, no longer see the parameter theta as the determination of an unknown variable, but rather as an unknown random variables, through the observation of the first class I sample Di, the probability distribution P (Dj | theta) into a posteriori probability P (theta | Dj), and bayesian estimation.
The nature of Bayesian estimation makes the optimal estimation of parameter θ obtained by Bayesian decision to minimize the total expected risk.
remarks
In this paper, the structure learning and parameter learning of Bayes are introduced, and the content may be complicated and redundant. Welcome the majority of netizens to criticize and correct.
Present past content
- Bayes review (1) – Gold Digging (juejin. Cn)
- Bayes review (2) – Gold Digging (juejin. Cn)