This is the 26th day of my participation in the August More Text Challenge

Download the materials from Coursera

Skip the useless information, useful translation.

Files included in this exercise:

  • Ex2.m-octave /MATLAB scripts
  • Ex2_reg. m – Octave/MATLAB script for practice
  • Ex2data1.txt – The training set for the first half of the exercise
  • Ex2data2.txt – The training set for the second half of the exercise
  • Submit. M – Submit the job script
  • Mapfeature. m – Function generates polynomial features
  • PlotDecisionBoundary. M – Draw decision boundaries
  • Plotdata. m – Draws 2D classified data
  • Sigmoid. m – Sigmoid function
  • Costfunction. m – logistic regression costFunction
  • Predict. m – Logistic regression prediction function
  • Costfunctionreg. m – Logistic regression cost function regularization

Among them

  • Represents the file you need to complete

1 Logistic Regression

In this part of the exercise, you will build a logistic regression model to predict whether a student will be admitted to college.

Suppose you are a college administrator and you want to predict each applicant’s chances of admission based on the results of two tests. You now have data from previous applicants as a training set for logistic regression. For each sample, you have the applicant’s two test scores and the admission results. Your task is to build a classification model to estimate the applicant’s probability of admission based on the scores from these two tests.

1.1 Visualizing the data

Before learning the algorithm, it is best to visualize the data. In the first part of ex2.m, the function plotData is called to generate a two-dimensional image of the loaded data. Now complete the plotData.m code and display the result shown in Figure 1.

To give you some practice, I’ll let you complete the plotdata.m code. But this is an optional problem, so you don’t have to do it, because it gives you the answer down here, so you can just copy and paste it.

% Find Indices of Positive and Negative Examples
pos = find(y==1); neg = find(y == 0);
% Plot Examples
plot(X(pos, 1), X(pos, 2), 'k+'.'LineWidth'.2.'MarkerSize'.7);
plot(X(neg, 1), X(neg, 2), 'ko'.'MarkerFaceColor'.'y'.'MarkerSize'.7);
Copy the code

If I copy it directly, it looks like this:Once you’re done, run Octave in the function’s directory, execute ex2, and you’ll see your image look like figure 1.

1.2 the implementation of

1.2.1 Warmup exercise: sigmoid function

Before we start with the cost function, recall that the logistic regression hypothesis is defined as follows:


h Theta. ( x ) = g ( Theta. T x ) . h_{\theta}(x)=g\left(\theta^{T} x\right),

Function g is an S-type curve function defined as follows:


g ( z ) = 1 1 + e z g(z)=\frac{1}{1+e^{-z}}

The first step is to implement this function in sigmoid.m so that it can be called by the rest of the program. Once done, try to test some values by calling sigmoid(x) in Otave. For x, if it’s a big positive number, the sigmoid should be close to 1; If it’s a very negative number, the sigmoID should be close to zero. Sigmoid (0) should be 0.5. Your code should also be able to handle vectors and matrices. The sigmoID function for a matrix should operate on each element. Go through these submissions. Look for the token on Coursera. Don’t know why you fell in love with an assignmentProgramming Exercise 1: Linear Regression1.1.

The answer is: g = 1./ (1 + exp(-1 * z));

I’m going to analyze it, and I’m going to ask you to finish the sigmoid curve function, which I mentioned above is sigmoid
g ( z ) = 1 1 + e z g(z)=\frac{1}{1+e^{-z}}
Now it means I’m going to give you an x and ask you to solve for it
g ( x ) = 1 1 + e x g(x)=\frac{1}{1+e^{-x}}
This x could be a number, it could be a matrix, it could be a vector.Let’s look at the original problem. We give this function sigmoid, which returns g, and the first sentence of the function body is to assign the return value, and give it a zero matrix, which has the same dimension as z. So if z is a matrix then g is a matrix, and if z is a number then g is a number. So you can just plug in z up here.

g = 1 ./ ( 1 + exp( -1 * z ));


  • 1 z -1 * z
    Multiply matrices and numbers*We can also multiply numbers and numbers*

  • e x p ( ) exp()
    ,
    e n e^n
    isexp(n)

  • 1 + e x p ( 1 z ) 1+exp( -1 * z )
    Add matrices and numbers directly+You can just add numbers and numbers+

  • 1. / ( 1 + e x p ( 1 z ) ) 1 ./ ( 1 + exp( -1 * z ))
    Matrix and number division must be used. /, so be careful here!!
  • Then add;Is so that the running process does not display the result of g. In Octave, a semicolon does not show the results; Executing a step without a semicolon displays the result of the step.
  • If you don’t remember any of this, I suggest you recall: The Octave Brief Tutorial

The file is as follows.

Run it from octave and it does exactly what the problem says:

For x, if it’s a big positive number, the sigmoid should be close to 1; If it’s a very negative number, the sigmoID should be close to zero. Sigmoid (0) should be 0.5.

Submit it.

Programming Exercise 1: Linear Regression 1.1

It means I got all five points. OK to finish.

1.2.2 Cost function and gradient descent

Now let’s implement the cost function and gradient descent algorithm of logistic regression. Finish the code in costfunction.m. Cost function:


J ( Theta. ) = 1 m [ i = 1 m y ( i ) log h Theta. ( x ( i ) ) + ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) ] \begin{aligned} J(\theta) =-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right] \end{aligned}

Gradient descent:


partial J ( Theta. ) partial Theta. j = 1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i ) \frac{\partial J(\theta)}{\partial \theta_{j}}=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}

The answer is:

hx = sigmoid(X*theta);
add = y .* log(hx) + (1 - y) .* log(1 - hx);
J = - 1 / m .* sum(add);

for j = 1 : n = length(theta)
  grad(j) = 1 / m .* sum((hx - y) .* X(:,j:j));
endfor
Copy the code

Let’s first analyze the given function:Let you write the costFunction that returns two values J and grad. J is the cost function, and grad is the partial derivative of gradient descent. And then you just follow the formula.

hx = sigmoid(X*theta);
add = y .* log(hx) + (1 - y) .* log(1 - hx);
J = - 1 / m .* sum(add);
Copy the code
  • hxis
    h ( x ) h(x)
    But you have to understand that this is a logistic regression!! Remember it is not the same as linear regression formula!! Just use the sigmoid function from the previous step.

I wasted dozens of minutes using the wrong formula. You can check your code any way you want, but you don’t get credit for submitting. It turns out that H (x) H (x)h(x) is using the linear regression formula…

  • addThis is the addend, this part of the summation formula
    y ( i ) log h Theta. ( x ( i ) ) + ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)
  • JThat’s the final cost function

Re-examine this piece of code:

for j = 1 : n = length(theta)
  grad(j) = 1 / m .* sum((hx - y) .* X(:,j:j));
endfor
Copy the code
  • Grad is the partial with respect to each x. There are as many x’s as there are theta’s, so let’s just figure out the length of theta vector
  • I’m going to use a for loop instead of grad
  • X(:,j:j): takes the JTH column of all rows of the matrix X. So I’m taking out all of them
    x j x_j
  • sum((hx - y) .* X(:,j:j): o
    i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i ) \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}
  • 1 / m .* sum((hx - y) .* X(:,j:j)): Take the whole
    1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i ) \frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}

When I finished, it was like this:

Submit to see:

I got a score for both.

1.2.3 Learning parameters using fminunc

Learn parameters using fMINunc.

Fminunc remember? For linear regression there are gradient descent and normal equations, and for logistic regression there are gradient descent and other advanced optimization methods. This other advanced optimization method is to use fminunc directly in Octave.

In the last assignment, you found the best parameter θ for the linear regression model by implementing gradient descent. You write a cost function and calculate its gradient, and then you do the gradient descent step.

In the exercise in the previous section (1.2.2 above) you also did logistic regression by writing gradient descent yourself.

But this time, you’re going to use the fminunc function, which is a built-in function of Octave/MATLAB and is an advanced optimization function for getting the optimal solution.

Octave/MATLAB’s FMINunc is an optimization solver that finds the minimum value of an unconstrained function.

Specifically, given a fixed data set (X and y values), fMINunc is used to find the optimal parameter θ for the logistic regression cost function. You need to pass the following input to fMINunc:

  • The initial value of the parameter we are trying to optimize.
  • The logistic regression cost function and the gradient function of the data set (X, y) with respect to θ are calculated when the training set and a specific θ are given.

In ex2.m, we have written code to call fminunc with the correct parameters.

Pay attention toThis code will call the fminunc function correctly. You don’t need to write it yourself.

In this code, we first define the options to use with fMINunc. Specifically,

  • Setting the GradObj option to on tells FMINunc that our function needs to return a cost function and a gradient. Allows FMINunc to use gradients when minimizing functions.
  • Set the MaxIter option to 400 so that fMINunc will iterate at most 400 times before terminating.

To specify the actual function we are minimizing, we use “shorthand” to specify the function with @(t) (costFunction(t, X, y)). This creates a function with argument t, which calls your cost function. Let fminunc call your costFunction together.

With fminunc, you don’t need to do anything because it’s a built-in function, you just need to provide the correct costFunction, which you did in section 1.2.2.

After fminunc is executed, ex2.m will insert the best parameter θ into your costFunction. You should see that the cost is about 0.203. This θ value is also used to plot the decision boundaries of the training data, resulting in a graph similar to figure 2. I recommend looking at the code in plotdecisionBoundary.m to see how to draw such boundaries using θ values.

1.2.4 Evaluation of logistic regression

Now you can use models to predict whether a particular student will be accepted or not. For a student with a score of 45 on Exam 1 and 85 on Exam 2, you should expect to see a 0.776 probability of admission.Another way to assess it is to look at how well the learning model predicts on our training set. Your task is to complete the code in Predictor.m. Given the data set and the learned parameter vector θ, predict will produce a “1” or “0” prediction.

After completing the code in predicter.m, ex2.m displays the training accuracy of the classifier in percentage terms.

Then submit your homework.

Answer: p = sigmoid(X * theta) >= 0.5;

Let’s start with the function file:

P = PREDICT(theta, X) computes the predictions for X using a threshold at 0.5 (i.e., if sigmoid(theta’* X) >= 0.5, predict 1)

This sentence is important. Use 0.5 as the threshold to see how the calculation of X can be predicted.

Okay, so now I’m going to submit it.The score.

Logistic regression regularization

In this part of the exercise, logistic regression is regularized to predict whether a factory chip passes quality assurance (QA). During the quality assurance process, each microchip goes through various tests to ensure it functions properly. Let’s say you’re a product manager at a factory, and you have results from two different measurements of microchips. From these two tests, you can decide whether the microchip is up to scratch. To help you decide, you have a data set of test results from past microchips from which you can build logistic regression models.Use another script, ex2_reg.m, to complete this part of the exercise.

In other words, this is a different exercise from the previous one. Now use the ex2_reg.m script.

2.1 Visual Data

Like the previous sections, plotData is used to generate a graph similar to Figure 3, where the axes are two test scores, with positive (y = 1, accept) and negative (y = 0, reject) examples displayed with different tags.

Figure 3 shows that our data set cannot be divided into positive and negative examples using a straight line. Therefore, the direct application of logistic regression on this data set does not perform well because logistic regression can only find linear decision boundaries.

2.2 Feature Mapping

One way to better fit the data is to create more features from each data point. In the provided function mapfeature.m, we will map features to all polynomial terms of x1 and x2, up to the sixth power.


 mapFeature  ( x ) = [ 1 x 1 x 2 x 1 2 x 1 x 2 x 2 2 x 1 3 x 1 x 2 5 x 2 6 ] \text { mapFeature }(x)=\left[\begin{array}{c} 1 \\ x_{1} \\ x_{2} \\ x_{1}^{2} \\ x_{1} x_{2} \\ x_{2}^{2} \\ x_{1}^{3} \\ \vdots \\ x_{1} x_{2}^{5} \\ x_{2}^{6} \end{array}\right]

As a result of this mapping, our two eigenvectors (scores from two quality assurance tests) have been transformed into a 28-dimensional vector. Logistic regression classifiers trained on this higher-dimensional feature vector will have more complex decision boundaries and will appear nonlinear when drawn in our two-dimensional graph. While feature mapping allows us to build more expressive classifiers, it is also easier to overfit. In the next part of the exercise, you implement regularized logistic regression to fit data and see for yourself how regularization can help solve overfitting problems.

2.3 Cost functions and gradients

Now, implement the code to calculate the cost function and gradient of the regular logistic regression. Complete the code in CostFunctionreg.m to return the cost and gradient. Recall that the regularization cost function in logistic regression is


J ( Theta. ) = 1 m i = 1 m [ y ( i ) log ( h Theta. ( x ( i ) ) ) ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) ] + Lambda. 2 m j = 1 n Theta. j 2 = 1 m i = 1 m [ y ( i ) log ( h Theta. ( x ( i ) ) ) + ( 1 y ( i ) ) log ( 1 h Theta. ( x ( i ) ) ) ] + Lambda. 2 m j = 1 n Theta. j 2 J(\theta)=\frac{1}{m} \sum_{i=1}^{m}\left[-y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)-\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2} \\= – \frac{1}{m} \sum_{i=1}^{m}\left[y^{(i)} \log \left(h_{\theta}\left(x^{(i)}\right)\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right]+\frac{\lambda}{2 m} \sum_{j=1}^{n} \theta_{j}^{2}

Note that the θ0θ_0θ0 parameters should not be regularized. In Octave/MATLAB, recall that indexing starts at 1, so you should not regularize the theta(1) parameter (corresponding to θ0θ_0θ0) in your code. The gradient of the cost function is a vector where the JTHJ ^{th} JTH element is defined as follows:


partial J ( Theta. ) partial Theta. 0 = 1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i )  for  j = 0 partial J ( Theta. ) partial Theta. j = ( 1 m i = 1 m ( h Theta. ( x ( i ) ) y ( i ) ) x j ( i ) ) + Lambda. m Theta. j  for  j p 1 \begin{aligned} &\frac{\partial J(\theta)}{\partial \theta_{0}}=\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)} \quad \text { for } j=0\\ &\frac{\partial J(\theta)}{\partial \theta_{j}}=\left(\frac{1}{m} \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}\right)+\frac{\lambda}{m} \theta_{j} \quad \text { for } j \geq 1 \end{aligned}

After writing the code, call ex2_reg.m and the result is about 0.693.

The answer:

n = size(theta);

hx = sigmoid(X*theta);
sum1 = sum(y .* log(hx) + (1 - y) .* log(1 - hx));
sum2 = sum(theta(2:n) .^ 2);
J = - 1 / m * sum1 + lambda / (2 * m) * sum2;

grad(1) = 1 / m .* sum((hx - y) .* X(1));

for j = 2:n
  grad(j) = 1 / m .* sum((hx - y) .* X(:,j:j)) + lambda / m .* theta(j);
endfor
Copy the code

Because it’s almost exactly the same as the first part of the code, except that there’s a regularization term at the end, so I’m not going to be able to explain every single line of code, and I’m sure I’ll be able to understand this as well.

Run it and the result is 0.693.

Submit a full score. That’s it. I’ll translate the rest when I have time.