Source: Man-machine and Cognition Laboratory
Abstract: With the rapid development of big data, machine learning based on probability and statistics has attracted great attention from industry and academia in recent years, and has gained many important successful applications in the fields of vision, speech, natural language, biology and so on.
Abstract
With the rapid development of big data, on the basis of probability and statistics of machine learning in recent years, great attention by industry and academia, and the vision, voice, natural language, biological field so as to obtain the success of many important applications, including the bayesian method and obtained the rapid development in the past 20 years, become a very important class of machine learning methods. This paper summarizes the latest progress of Bayesian methods in machine learning, including the basic theory and methods of Bayesian machine learning, non-parametric Bayesian methods and common reasoning methods, regularized Bayesian methods, etc. Finally, a brief introduction and prospect are given to the problem of large-scale Bayesian learning, and its development trend is summarized and prospected.
keywords
Bayesian machine learning; Nonparametric method; Regularization method; Big data learning; Big data Bayesian learning
Machine learning is a common research focus in the field of artificial intelligence and pattern recognition, and its theories and methods have been widely used to solve complex problems in engineering applications and scientific fields. The 2010 Turing prize was awarded to LeslieValliant of Harvard University for her work in developing probabilistic approximate correct (PAC) learning theory; The Turing Prize 2011 was awarded to Professor Judea Earl of the University of California, Los Angeles, for her contributions to the establishment of artificial intelligence methods based on probability and statistics, and her research achievements have contributed to the development and prosperity of machine learning.
An important branch of machine learning is Bayesian machine learning, which originated from a special case of Bayes’ theorem proved by British mathematician Thomas Bayes in 1763 [1-2]. After the joint efforts of many statisticians, Bayesian statistics was gradually established after the 1950s and became an important part of statistics [2-3]. Bayes’ theorem is known for its unique understanding of the subjective degree of confidence in probability [4]. Since then, Bayesian statistics has been widely and profoundly applied in many statistical machine learning fields such as posterior inference, parameter estimation, model detection, hidden variable probability model, etc. [5-6]. It has a history of more than 250 years from 1763 to the present, during which the Bayesian statistical method has made great progress [7]. In the 21st century, all kinds of knowledge will be integrated, and the field of Bayesian machine learning will have broader application scenarios and play a greater role.
1. Bayesian learning basis
This section will give a brief introduction to Bayesian statistical methods [5] : Mainly including Bayesian theorem, inference methods of Bayesian model and some classical concepts of Bayesian statistics.
1.1 Bayes’ theorem
withRepresents the parameters of the probability model, and D represents the given data set. The prior distribution and likelihood function in a given model, the posterior distribution of the model can be obtained by Bayes’ theorem (also known as Bayes’ formula) [2] :
(1)
Among themIs the edge likelihood function of the model.
Bayes’ theorem is well known. Here is an equivalent but little-known expression of Bayes’ theorem, namely variational reasoning based on optimization:
(2)
Where P is the normalized probability distribution space. It can be proved that the optimal solution of variational optimization in Equation (2) is equivalent to the result of posterior reasoning in Equation (1) [8]. This variational Bayes theorem is of great significance in two aspects: 1) it provides a theoretical basis for the variational Bayes method [9]; 2) A good framework is provided to facilitate the reference of posterior constraints and enrich the flexibility of Bayesian model [10]. These two points will be elaborated in later chapters.
1.2 Bayesian machine learning
Bayes method has many applications in the field of machine learning, from univariate classification and regression to multivariable structured output prediction, from supervised learning to unsupervised and semi-supervised learning, bayes method is used in almost any learning task. Below is a brief introduction to the more basic common tasks.
1) Prediction. Given training data D, the prediction of future data X can be obtained through bayesian method [5] :
(3)
It should be pointed out that when the model is given, the data comes from independent identically distributed sampling, soUsually simplified to.
2) Model selection. Another very important application of Bayesian method is model selection [11], which is a relatively basic problem in the field of statistics and machine learning. A family of models (such as linear models) is represented by M, where each element, θ, is a specific model. Bayesian model selection selects the optimal one by comparing the likelihood functions of different family models:
(4)
When there is no obvious prior distribution,It’s considered evenly distributed. Through the integral operation of Equation (4), bayesian model selection can avoid overfitting.
Some papers and textbooks go further on Bayesian statistics and Bayesian learning in more detail].
Nonparametric Bayesian method
In the classical parameterized model, the number of model parameters is fixed and will not change with the change of data. Taking the unsupervised clustering model as an example, if the number of clustering centers can be obtained through automatic learning of the data itself, it is much better than parameterized models (such as K-means and Gaussian mixture models) to set a parameter according to experience. This is also an important advantage of nonparametric models. Compared with parameterized Bayesian methods, nonparametric Bayesian methods have the advantage of strong ability to describe data due to its non-parametric characteristics of prior distribution [13]. Therefore, non-parametric Bayesian methods have received more attention since 2000 [14]. For example, implicit mixing model [15] and implicit feature model [16] with unknown dimensions, gaussian process describing continuous functions [17], etc. Parameterized bayesian method is important to emphasize that does not mean that the model has no parameters, it is the model can be infinitely more parameters, and the number of parameters could be changed with the data and the adaptive change, this feature to solve the complex problems in applications of big data environment is particularly important, because one of the characteristics of big data is a dynamic variable. Some of the more important models and reasoning methods will be briefly introduced below.
2.1 Dirichlet process
Dirichletprocess (DP) is a random process defined on probability measure ω proposed by statistician Ferguson in 1973 [18]. Its parameters include concentrated parameter α > 0 and base probability distribution
, usually written as G ~. The probability distribution obtained by Dirichlet process is discrete, so it is very suitable for building a mixture model. For example, Antoniak constructed a Dirichlet process mixture (DPM) model in 1974 by adding a generation probability to each data point [15], i.e
(5)
Among them,Is the parameter that generates the probability distribution of each data point, such as the mean and covariance of the Gaussian distribution, and the number of N data points.
A stochastic process equivalent to the Dirichlet process is the Chinese Restaurant Process (CRP) [19]. Chinese restaurant process is a kind of random process defined in real number field with clustering characteristics, and is often used because of its good presentation characteristics. As shown in Figure 1, in the course of a Chinese restaurant, it is assumed that there are infinite tables and several guests; The first customer chooses the first table, and the subsequent customers choose the table according to the polynomial distribution, where the probability of choosing each table is proportional to the number of people sitting at the table, and at the same time, with a certain probability (proportional to the parameter α) to choose an empty table. It can be seen that when all the guests have chosen tables, we can divide the guests according to tables. Here each table represents a cluster and each guest represents a data point.
It can be proved that all the clustering point parameters θ can be obtained by equation (6) :
(6)
The Chinese restaurant process can be obtained by G integral in the Dirichlet hybrid model, which also illustrates the relationship between the two random processes. This concise expression is also conducive to sampling by Markov Monte Carlo method [20].
Another constructional Dirichlet process is described as stickbreaking Construction [21]. Specifically, take a stick of unit length, and cut it for the KTH time according to the random variable of the Beta distribution, in proportion to the remaining length:
(7)
As shown in Figure 2, for a stick of length 1, the first cut is madeLength. After each cut, we cut the restThe proportional length of. The truncated rod representation of Dirichlet process is the basis of variational reasoning [22].
2.2 Indian Buffet process
Unlike the mixed model, where each data point belongs to only one cluster, in the feature model, each data point can have multiple features, which constitute the process of data generation. This is also in line with the actual requirement that sample data points have multiple attributes in the actual situation. Classic feature models mainly include Factor analysis and Principal Component analysis [24-25], etc. In traditional feature models, the number of features is fixed, which limits the performance of the model. The Indian Buffet Process (IBP) was proposed in 2005 [26]. Due to its non-parametric characteristics, the number of features in the model can be learned from the data, so that the model can better explain the data. It has been applied in factor analysis, social network link prediction and other important problems [27-29].
Taking binary (” 0 “or” 1 “) features as an example, assuming that there are N data points, and the feature vectors of all data points form a feature matrix, the production process of IBP can be vividly analogous to the process of N customers choosing food in a buffet with infinite number of dishes, with “1” indicating choice and “0” indicating no choice. Specifically describe the method as shown in Figure 3:
1) The first customer chooses aFood items, among them ;
2) There are two situations for the second and subsequent customers: 1. For the food that has been selected, the food will be selected according to the probability that the number of people choosing the food is proportional to the probability; 2. 2. ChooseOf the unselected meals .
Similar to the process of Chinese restaurants, the process of Indian buffet also has its corresponding cutting process [30]. I will not repeat it here, but only list its constructional statements as follows:
(8)
However, unlike the Chinese restaurant process, the length of the sticks does not add up to 1. Indian buffet process also has its corresponding sampling method and variational optimization solution method [16,30-31].
2.3 Application and extension
Bayesian methods, especially the recently popular non-parametric Bayesian methods, have been widely applied in various fields of machine learning and received good results [32]. Several applications and extensions are briefly proposed here. Related applications of large-scale Bayesian learning will be introduced in Section 5, and relevant literature is also available [13-14, 33].
Classical non-parametric Bayesian methods usually assume that data has simple properties, such as exchangeability or conditional independence. However, data in the real world often has different structures and dependencies. In order to adapt to different needs, the development of stochastic processes with various dependence characteristics has been widely concerned. For example, in the topic mining of text data, the data often come from different fields or types. We usually hope that the topic we learn has some kind of hierarchical structure. For this reason, hierarchical Dirichlet Process (HDP) [34] is proposed. Multiple layers of topic representation can be automatically learned and the number of topics can be automatically determined. In addition, the IBP process with multiple layers has also been proposed [35], which is used to learn the structure of deep confidence network, including the layers of neurons, the number of neurons at each layer, and the connection structure of neurons between layers. Other examples include infinite hidden Markov model with Markov dynamic dependence [36], Dirichli process with spatial dependence [37], etc.
In addition, for supervised learning problems, nonparametric Bayesian models have recently received extensive attention. For example, social network data modeling and prediction is an important problem. The recently proposed ibp-based non-parametric bayesian model [27,29] can automatically learn hidden features and determine the number of features, thus achieving good prediction performance. Good results have also been achieved by using DP hybrid model to perform clustering and classification tasks simultaneously [38].
3. Inference method of Bayesian model
The inference method of Bayesian model is an important part of Bayesian learning. The quality of inference method directly affects the performance of the model. Specifically, a key problem of the Bayesian model is that the posterior distribution is usually unsolvable, making the Bayesian integrals in Equations (3) and (4) also unsolvable. At this point, some effective reasoning methods are needed. Generally speaking, there are two types of methods: Varia-tional inference and Monte Carlo methods. Both of these methods are widely used in the field of Bayesian learning, and they are respectively introduced below.
3.1 Variational reasoning method
Variational method is a widely used approximate optimization method [39-40], which solves many problems in the fields of physics, statistics, financial analysis and control science. In the field of machine learning, variational methods are also widely used: through variational analysis, non-optimization problems can be transformed into optimization problems for solution, and some difficult problems can be solved by variational approximation methods [41].
In variational Bayesian method, given data set D and posteriori distribution to be solved, the variation method defines the approximate distribution of the posterior distribution as. By using Jason’s inequality, the evidence lower bound (ELOB) of logarithmic likelihood can be obtained.
(9)
By maximizing the logarithmic likelihood lower bound:
(10)
Or minimize 和The KL divergence between them can complete the optimization solution process. Therefore, the basic idea of variational reasoning is to transform the original problem into an optimization problem solving approximate distribution, and combine it with an effective optimization algorithm to complete the task of Bayesian reasoning [22, 42-43].
In many cases, a model θ has some parameters θ and an implicit variable H. In this case, the variational EM algorithm can be used to solve the variational EM problem by introducing mean-fieldassumption., EM algorithm can be performed iteratively [44].
3.2 Monte Carlo method
Monte Carlo method is a kind of estimation of unknown probability distribution by using simulated random numbers. When the unknown distribution is difficult to estimate directly or the search space is too large and the calculation is too complex, Monte Carlo method becomes an important reasoning and calculation method [45-46]. For example, Bayesian machine learning usually involves calculating the expectation of a function under some distribution (prior or posterior), and such calculation usually has no analytic solution. Assuming thatIs a probability distribution and the goal is to compute the following integral:
(11)
The basic idea of the Monte Carlo method is to approximate I using the following estimates:
(12)
Among themI’m sampling from P. According to the law of large numbers, when the number of samples is large enough, the Monte Carlo method can estimate the real expectation very well.
The above description is the basic principle of Monte Carlo method, but it is not easy to get P sampling in the actual process, and other methods are often adopted. The commonly used methods include Importance sampling, rejection sampling, Markov Chain Monte Carlo (MCMC) and so on. The first two are more effective when the distribution is relatively simple, but the effect is often not good for the complex distribution of higher dimensional space, facing the problem of dimension disaster. The following focuses on the MCMC method, which is also effective in higher dimensional Spaces.
The basic idea of MCMC method is to construct a random Markov chain and make it converge to a specified probability distribution, so as to achieve the purpose of inference [47]. A more commonly used MCMC method is the Metropolis-Hastings algorithm [48] (MH algorithm). In MH algorithm, by constructing a fromState toState transition rules:
1) according toGet a new state sample from the old state sample;
2) Calculate the acceptance probability:
(13)
3) Sampled from the 0-1 uniform distribution[0, 1]. if, sampling is acceptedOtherwise, sampling is rejected 。
Another commonly used MCMC method is Gibbs sampling [46,49], which is a special case of MH algorithm. Gibbs sampling has been widely used in inference of bayesian analysis. Gibbs adoption is to sample each variable in the multivariable distribution under the condition that other observed and sampled variables are known, update the existing parameters, and finally converge to obtain the posterior distribution of the target. Assume that the multivariate distribution to be sampled is, that is, a dimension j: 1≤j≤d is selected each time, where D is the multivariate distributionThe dimension; And then from the conditional probability distribution 对Sampling.
Many bayesian models adopt MCMC’s method for inference and have achieved good results [20,30,50]. In addition, there is a class of non-random walk MCMC methods — LangevinMCMC [51] and Hybrid MonteCarlo [52]. This kind of method usually has faster convergence speed, but the expression is more complex, so it is not as popular as Gibbs sampling. However, the sampling method based on random gradient developed recently in the big data environment is very effective, and will be briefly introduced later.
Regularized Bayesian theory and its application examples
In Section 2, two equivalent expressions of Bayesian method are mentioned, one is the posterior inference method, the other is the optimization method based on variational analysis, of which the second method has been greatly developed in recent years. Based on such equivalence relation, we have put forward regularized Bayesian Inference (RegBayes) theory in recent years [10] : As shown in Figure 4, in the process of classical Bayesian inference, posterior distribution can only be obtained from two dimensions, namely prior distribution and likelihood function. In regularization Bayesian reasoning, posterior reasoning is transformed into a variational optimization method. By introducing posterior regularization, the third degree of freedom is provided for Bayesian reasoning, which greatly enrichis the flexibility of Bayesian model. Under the guidance of RegBayes theory, we systematically studied discriminant Bayesian learning based on maximum interval criterion and Bayesian learning combined with domain knowledge, etc., and achieved a series of results [].
The basic framework of regularization Bayesian reasoning can be summarized as follows. On the basis of Equation (2), a posterior regularization term is introduced to consider domain knowledge or expected model attributes:
(14)
It’s a convex function. There are three questions to answer when using RegBayes to solve specific problems:
Question 1. Where the posterior regularization comes from. Posterior regularization is a general concept that can cover any information that is expected to affect a posterior distribution. For example, in supervised learning tasks (such as image/text classification), where we expect the posterior distribution to be able to predict accurately, we can take the classification error rate (or some upper bound) as the optimization goal and reference it into the learning process through posterior regularization. Typical examples include infinite support vector machine [38], infinite implicit support vector machine [56], Maximummargin supervised topic model [57] MedLDA, etc., all of these methods adopt the principle of maximum interval to directly minimize the upper bound of classification error rate (namely hinge loss function) in the process of Bayesian learning, and achieve significant performance improvement in test data.
In addition, in some learning tasks, some domain knowledge (such as expert knowledge or public knowledge gathered through crowdsourcing) can provide some information beyond the data, which is of great help to improve model performance. In this case, domain knowledge can be added into the model together with data as a posterior constraint to realize efficient Bayesian learning. It should be pointed out that there is often a lot of noise in public knowledge. How to adopt effective strategies to filter noise and realize effective learning is the key to the problem. In this regard, we proposed to introduce domain knowledge using logical expression into Bayesian topic model robustly, achieving better model effect [58].
Question 2. The relationship between prior distribution, likelihood function and posterior regularization. Prior distribution is independent of data and the probability distribution based on prior knowledge cannot reflect the statistical characteristics of data. Likelihood function is based on the probability distribution of data generation, reflecting the basic properties of data, usually defined as a normalized probability distribution with good analytical form. Then check regularization item is to use the same data features to define, but it has a more flexible manner, not bound by normalization, therefore, can be more accurate and convenient to depict properties or domain knowledge, such as the maximum spacing for learning in problem 1 and domain knowledge combined with bayesian statistics such as the sample. It can even be proved that some posterior distributions cannot be obtained by Bayes’ theorem, but can be obtained by posterior regularization [10]. RegBayes is therefore a more flexible and powerful approach than the classic Bayesian approach.
Question 3. How to solve the optimization problem. Although regularized Bayes has great flexibility, its learning algorithm can still be solved by using variational method or Monte Carlo method. Please read related papers for specific solving methods. The big data Bayesian learning theory and algorithm introduced below can be applied to quickly solve the regularized Bayesian model [55], which is also the current research hotspot.
Big data Bayesian learning
With the development of Internet technology, the research on the theory, algorithm and application of machine learning for big data has become a hotspot of current research [[59] 59], which has received wide attention from academia and industry. The Bayesian model has good data adaptability and scalability, and has achieved good results on many classical problems. However, a big problem of the traditional Bayesian model is that its reasoning method is usually slow, especially in the context of big data, it is difficult to adapt to the requirements of the new model. Therefore, how to carry out large-scale Bayesian learning methods is one of the important challenges in the academic world. Fortunately, significant progress has been made in big Data Bayesian learning (BigBayes) recently. The following is a brief introduction of the progress in stochastic algorithms and distributed algorithms, with some of our research results as examples. Table 1 provides a brief summary of current advances on several fronts:
5.1 Random gradient and online learning methods
When the amount of data is large, the accurate algorithm often takes a long time and cannot meet the needs. A common solution is to use random approximation algorithm [60-61]. Such algorithms can converge to better results in a relatively short time through multiple random subsampling on large-scale data sets. This idea has been widely used in variational reasoning and Monte Carlo algorithms, which are briefly described below.
As mentioned above, the core of variational reasoning is to solve optimization problems. Therefore, the stochastic gradient descent algorithm based on multiple random descent sampling becomes a natural choice. Specifically, stochastic Gradient Descent (SGD) [62] randomly selects a subset of data each time, estimates the gradient on the entire data set with the gradient calculated on this subset, and updates the parameters required for the solution:
(15)
Where Q is the objective function to be optimized and the TTH subset of the data. It is worth noting that the gradient in Euclidean space is not the optimal direction for solving the variational distribution; For optimization of probability distribution, natural gradient tends to achieve faster convergence rate [63]. Recent major advances include random variable decibel Bayesian methods [61] and a variety of rapidly improved algorithms utilizing model characteristics [64] [64].
In monte Carlo algorithm, the random gradient method can be used to improve the corresponding gradient-based sampling algorithm. Stochastic gradient Langevin Dynamics (SGLD) [65], stochasticHamiltonian Monte Carlo (stochasticHamiltonian Monte Carlo, SHM) [66] [66]. These algorithms speed up monte Carlo sampling and have good results.
Case 1. In order to adapt to the processing requirements of dynamic streaming data, large-scale Bayesian inference algorithm based on online learning has also become a research hotspot recently, and its main work includes variable decibel Bayesian inference of streaming data [67], etc. Recently, we proposed the online Bayesian Maximum-interval learning (Online BayesPA) framework, which significantly improved the learning efficiency of regularized Bayes. And the theoretical boundary of regret value of online learning is given [55]. As shown in Figure 5, a partial experiment on more than 1 million Wikipedia pages shows, the online learning-based algorithm is about 100 times faster than the batch algorithm without losing classification accuracy.
5.2 Distributed reasoning algorithm
Another algorithm suitable for large-scale Bayesian learning problems is based on distributed computing [68], that is, bayesian inference algorithm deployed on distributed systems. This kind of algorithm needs to carefully consider the actual application scenarios of the algorithm, comprehensively consider the cost of algorithm calculation and communication, and design reasoning algorithms suitable for different distributed systems.
Some algorithms do not need to exchange information between some parameters, just need to calculate the final result summary; For this kind of problem, the original algorithm only needs to be properly optimized and deployed in the system can have good effect. However, there are more algorithm itself is not suitable for parallel processing, this means that the algorithm itself needs to be modified, make its can be distributed computing, it is also one of the highlights in agro-scientific research in the large-scale bayesian learning, and have obtained many important advances, including distributed variational inference [67] [69] and distributed the monte carlo method, etc.
Case 2. Taking the topic model as an example, the classical model can learn the large-scale topic structure by using conjugate Dirichlet priori [70], but cannot learn the correlation between topics. For this reason, correlated Topic Model (CTM) using non-conjugated Logistic-normal prior was proposed [71]. The disadvantage of CTM is that its inference algorithm is difficult, and the existing algorithm can only deal with graph structure learning of dozens of topics. Therefore, the author’s research group recently proposed a distributed reasoning algorithm for CTM [72], which can process large-scale data sets and learn the graph structure between thousands of topics. Partial results of this algorithm are shown in Table 2, where D represents the size of data set and K represents the number of topics. As you can see from Table 2, the distributed inference algorithm (i.e., gCTM) greatly increases the amount of data the model can carry (e.g., 6 million Wikipedia pages) and the number of topics (e.g., 1000). The code for this project and more information are available for readers to browse [73].
On the basis of the learning of the above large-scale topic map structure, the visual interface “TopicPanorama” was further developed, which can fuse multiple topic map structures and display them in the same interface in a user-friendly way, as shown in figure 6, where each node represents a topic. The edge between nodes represents the association relationship, and the length of the edge represents the association strength. The data set used is the news web pages related to Microsoft, Google, Yahoo and other THREE IT companies. The visualization tool has a variety of interactive functions. Users can zoom in or zoom out on the part of the theme map, and at the same time, they can modify the structure of the map and feed back to the background algorithm for online adjustment. Several field experts agree that the tool makes it easy to analyze social media data. Refer to reference [74] for more detailed description.
5.3 Hardware-based acceleration
With the development of hardware, The acceleration of Bayesian learning using hardware resources such as Graphics processing units (Gpus) and field-programmablegate array (FPGA) is also a hot topic recently. For example, some researchers use GPU technology to accelerate variational method [75] and MCMC algorithm [76-77] of topic model, and some researchers use FPGA to accelerate Monte Carlo algorithm [78]. Powerful hardware devices, coupled with appropriate model and algorithm architecture, can get twice the result with half the effort.
6. Summary and Outlook
Bayesian statistical method and its application in machine learning is an important research content of Bayesian learning. Bayes learning is widely used because of its adaptability and extensibility. Nonparametric bayesian method and regularized Bayesian method greatly develop bayesian theory and make it have more powerful vitality.
In recent years, bayes learning of big data has become the focus of people’s attention, how to strengthen the flexibility of Bayesian learning and how to speed up the inference process of Bayesian learning, so as to make it more adapt to the challenges of the era of big data has become the issue of people’s consideration. In this period, many new methods and theories will be proposed, and Bayesian learning will be combined with many other aspects of knowledge, such as parallel computing, data science and so on, resulting in many new achievements. It can be expected that Bayesian learning will definitely have more newer and better results, and will be more widely used in the future.
Zhu Jun. born in 1983. Associateprofessor and PhD supervisor in Tsinghua University. His current researchinterests include machine learning, Bayesian statistics, and large-scalelearning algorithms and applications.
Hu Wenbo, born in 1992.PhDcandidate in Tsinghua University. His current research interests includemachine learning and scalable Bayesian learningmethods([email protected]).
The Future Intelligence Laboratory (FUTUr) is an interdisciplinary research institute of artificial intelligence, Internet and brain science jointly established by artificial intelligence scientists and related institutions of the Chinese Academy of Sciences. It was founded by Liu Feng, the author of Internet evolution, and professor Shi Yong and Liu Ying of the Research Center for Virtual Economy and Data Science of Chinese Academy of Sciences.
The main tasks of the Future Intelligence Lab include: establishing the INTELLIGENCE quotient evaluation system of AI systems and carrying out the intelligence quotient evaluation of the world artificial intelligence; To carry out Internet (city) cloud brain research plan, build Internet (city) cloud brain technology and enterprise map, to improve the intelligent level of enterprises, industries and cities.
If you are interested in the research of the laboratory, welcome to join the online platform of future Intelligent Laboratory. Scan the following QR code or click “Read the original article” at the lower left corner of this article