Machine learning basics - Logistic Regression

“This is the 9th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

Logistic regression, also known as Logistic regression analysis, is a generalized linear regression analysis model, which is often used in data mining, economic forecasting and other fields. We know that for regression wTxiw^Tx_iwTxi the output range is real, and for classification problems we want the output to predict the probability of a certain category.

We introduced maximum likelihood estimation and maximum posteriori estimation before, they are based on MLE of frequency school and MAP of Bayes school respectively, and we know that MAP adds a priori on MLE basis. So first of all, let’s think about how do we map a range of real numbers to a probability space of values from 0 to 1, and we need to talk a little bit about sigmoid, right

\sigma(z) = \frac{1}{1+e^{-z}}

Let’s look at the graph of the function

This function, whose input is a range of real numbers and whose output is between 0 and 1, converts a probability. Here we’re talking about a probability, let’s say a dichotomous problem, in terms of conditional probability, for example, y equals one and y equals zero for two categories, and this is a Bernoulli distribution, which is a 0, 1 problem.

P_1(y=1|x) = \sigma(w^Tx) = \frac{1}{1 + e^{-w^Tx}}\\ P_0(y=0|x) = 1 – p_1 = 1 – \sigma(w^Tx) = \frac{e^{-w^Tx}}{1 + e^{-w^Tx}}\\

This is a binomial distribution

p(y|x) = p_1^yp_0^{1-y}

Data can be understood as a conditional probability P (Y ∣ X) P (Y | X) P (Y) ∣ X X sample collection, Y for the label, because we in the given data X Y of conditional probability.

\hat{w} = \argmax_w \log P(Y|X)\\ \argmax_w \log \prod_{i=1}^N P(y_i|x_i)\\

The joint probability, because each probability event is independent so np can be written as g ∏ I = 1 (yi ∣ xi) g \ prod_ {I = 1} ^ N P (y_i | x_i) g ∏ I = 1 np (yi ∣ xi)

\argmax_w = \sum_{i=1}^N(y_i \log p_1 + (1-y_i)\log p_0)\\

f(x_i; w) = \frac{1}{1 + e^{-w^Tx}}

\argmax_w = \sum_{i=1}^N(y_i \log f(x_i; w) + (1-y_i)\log (1 – f(x_i; w)))\\

So to translate it into a problem, adding a minus sign is the cross entropy, so that we can do the logistic regression for the classification problem

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Machine learning basics — Logistic Regression

Machine learning basics — Logistic Regression

Related Posts

Common misunderstandings of image convolution and personal thinking

MySQL write order and execution logic? SQL condition filter negative filter five solution to teach you to understand it!

Selected machine learning interview questions and won the offer of machine learning post in first-line Internet companies