Click on the blue word to pay attention to me, there are dry goods to receive!

I recently translated the question bank for Ng’s machine Learning course. The course series itself is so famous and classic that I don’t need to go into details

The main reason is that I found that most of the courses and programming questions in the market are translated versions, and few people translate the quiz questions. But recently I personally test feel, this set of questions actually have something, test point is very delicate. Novice with tutorial use, can better learn knowledge; Old hands can help you review theoretical knowledge when they are wrong enough to doubt life. So we moved here ~

In addition, because there is a part of the water-based questions, so I moved excerpts.

Number 16

Let’s say m=4 students take a class with a midterm and a final exam. You have collected a dataset of their scores on both tests, as shown below:

The midterm score (Mid-term score)^2 The final score
89 7921 96
72 5184 74
94 8836 87
69 4761 78

You want to use polynomial regression to predict a student’s midterm score. Specifically, suppose you want to fit a model where x1 is the midterm score and x2 is the (midterm score) ^2. In addition, you plan to use both feature scaling (divided by the “maximum-minimum” or range of features) and mean normalization.

What are the normalized eigenvalues? (Hint: Midterm =89, final =96 is training example 1)

Question 17

15 iterations of gradient descent are performed, and the calculation is performed after each iteration. You’ll find that the values of the values drop slowly and are still falling after 15 iterations. Based on this, which of the following conclusions seems most plausible?

A. is an effective choice of learning rate.

B. Instead of using the current value, try smaller values (e.g.)

C. Instead of using the current value, try a larger value (e.g.)

18 the topic

Suppose you have m=14 training examples, n=3 features (not including the intercept term constant 1 that needs to be added), and the normal equation is. Given values of m and n, what are the dimensions of this equation?

A. 14 x 3, 14 x 1, 3 * 3 B. 14 x 4, 14 x 1, 4 x 1 C. 14 x 3, 14 x 1, 3 x 1 D. 14 x 4, 14 x 4, 4 x 4

Article 19 the topic

Suppose you have a data set with m= 1,000,000 examples and N = 200,000 features per example. You want to use multiple linear regression to fit the parameters into our data. Should you use gradient descent or normal equations?

A. Gradient descent, because the calculation in normal equations is very slow

B. Normal equations because they provide an efficient way to solve them directly

C. Gradient descent, because it always converges to the best

D. normal equation, because gradient descent may not find the optimal

Question 20

Which of the following are reasons to use feature scaling?

A. It prevents gradient descent from falling into local optimum

B. It accelerates gradient descent by reducing the computational cost of each iteration of gradient descent

C. It speeds up the gradient descent by reducing the number of iterations to obtain a good solution

D. It prevents the matrix (for normal equations) from being irreversible (singular/degenerate)

Question 27

Suppose you have the following training set and fit the Logistic regression classifier

Which of the following is true? Select all the correct items

A. Adding polynomial features (e.g., using) can increase the degree to which we can fit training data

B. At the best value of, for example, found by fMINunc,

C. Add polynomial features (for example, usage will increase because we are now summing over more terms

D. If we train gradient descent iterations enough times, for some examples in the training set, it may be obtained

Question 28

For logistic regression, the gradient is given. Which of the following is the correct gradient descent update for logistic regression of learning rate? Select all the correct items

A. 

B. (update all at the same time)

C. (update all simultaneously)

D. (update all simultaneously)

Question 29

Which of the following statements is true? Select all the correct items

A. For logistic regression, gradient descent sometimes converges to A local minimum (and the global minimum cannot be found). This is why we prefer more advanced optimization algorithms such as FMINunc (conjugate gradient /BFGS/L-BFGS/ etc.)

B. The value of the sigmoid function is never greater than 1

C. The cost function of trained logistic regression is always greater than or equal to zero

D. It is always effective to use the method of linear regression + threshold to make classification prediction

Question 31

You are training a categorical logistic regression model. Which of the following statements is true? Select all the correct items

A. Introducing regularization into the model always results in the same or better performance on the training set

B. Adding many new features to the model helps prevent over-fitting of the training set

C. By introducing regularization into the model, the same or better performance can always be achieved for examples not in the training set

D. Adding new features to the model always results in equal or better performance on the training set

Question 31

You are training a categorical logistic regression model. Which of the following statements is true? Select all the correct items

A. Introducing regularization into the model always results in the same or better performance on the training set

B. Adding many new features to the model helps prevent over-fitting of the training set

C. By introducing regularization into the model, the same or better performance can always be achieved for examples not in the training set

D. Adding new features to the model always results in equal or better performance on the training set

Question 32

Suppose you performed two logistic regressions, one for and one for. In one case, you get the parameters, and in the other case, you get. However, you forget which value corresponds to which value. Which do you think corresponds?

A. 

B. 

Question 33

Which of the following statements about regularization is true? Select all the correct items

A. Using too large A value may cause your assumptions to overfit the data; This can be avoided by reducing

B. Using very large values does not affect the performance of the hypothesis; The only reason we don’t set it too large is to avoid numerical problems

C. Consider a classification problem. Adding regularization may cause the classifier to misclassify some training examples (it correctly classifies these examples when regularization is not used, i.e., when)

D. Since the output value of logistic regression, the range of its output value can only be “narrowed” a little by regularization anyway, so regularization usually does not help it

36 the topic

Which of the following statements is true? Select all correct items

A. The activation value of the hidden unit in the neural network is always in the range of (0,1) after the sigmoid function is applied

B. Logical functions on binary values (0 or 1) can be (approximately) represented by some neural network

C. Two layers (one input layer, one output layer, no hidden layer) of neural networks can represent xOR functions

D. Suppose there is a multi-class classification problem of three classes, use three-layer network for training. Set to the activation of the first output unit, and similarly, have and. So for any input x, there has to be

Question 37

Consider the following two binary input and output neural networks. Which of the following logical functions does it (approximately) compute?

A. OR

B. AND

C. NAND (not)

D. XOR

Question 38

Consider the neural network given below. Which of the following equations correctly calculates activation? Note: is the SigmoID activation function

A. 

B. 

C. 

D. The network does not have activation

Question 39

You have the following neural network:

If you want to calculate the activation of a hidden layer, one way is to use the following Octave code:

You need a vectorized implementation (that is, one that does not loop). Which of the following implementations computes correctly? Select all the correct items

A. z = Theta1 * x; a2 = sigmoid (z) 

B. a2 = sigmoid (x * Theta1) 

C. a2 = sigmoid (Theta2 * x) 

D. z = sigmoid(x); a2 = sigmoid (Theta1 * z)

40 questions

You are using the neural network shown in the figure below, and have learned parameters (for calculation) and (for acting on functions, for calculating values).

Suppose you swap the parameters of the two cells of the first hidden layer, and also swap the output layer. How does this change the value of the output?

A. same B. bigger C. smaller D. Incomplete

Topic 41

You are training a three-layer neural network and want to use back propagation to compute the gradient of the cost function. In the back propagation algorithm, one of the steps is to update for each I, j, which of the following is the correct vectorization of this step?

A.   

B.   

C.   

D. 

43 the topic

Shall have, have. Use the formula to numerically calculate the approximation at. What value are you going to get? (When, the exact derivative is)

A.8 B.6 C.5.9998 D.6.0002

Question 44

Which of the following statements is true? Select all correct items

A. Large values will not affect the performance of the neural network; The only reason we don’t set it too large is to avoid numerical problems

B. Gradient checking is useful if we use gradient descent as an optimization algorithm. However, if we use an advanced optimization method (such as in FMINUNC), it is not much use

C. Using gradient checking can help verify that the implementation of backpropagation is bug-free

D. If our neural network overfits the training set, a reasonable step is to increase the regularization parameters

Question 45

Which of the following statements is true? Select all correct items

A. Assume that the argument is A square matrix (i.e. the number of rows equals the number of columns). If we substitute its transpose, then we haven’t changed what the network is computing.

B. Suppose we have a correct back-propagation implementation and are training a neural network using gradient descent. Suppose we plot as a function of the number of iterations and find that it is increasing rather than decreasing. One possible reason is the high learning rate.

C. Suppose we use gradient descent of learning rate. For logistic regression and linear regression, it is a convex optimization problem, so we do not want to choose too large a learning rate. However, for neural networks, it may not be convex, so choosing a very large value will only speed up convergence.

D. If we are using gradient descent to train a neural network, a reasonable debugging step is to plot it as a function of the number of iterations and ensure that it is decreasing (or at least not increasing) after each iteration.

Question 46

You train a learning algorithm and find that it has a high error on the test set. Plot the learning curve and get the following figure. Does the algorithm have high bias, high variance, or neither?

A. High bias B. high variance C. neither

Question 47

Assume that you have implemented regularized logistic regression to classify objects in an image (that is, image recognition has not been implemented). However, when you test your model on a new set of images, you find that it has a very large error in predicting the new image. However, your hypothesis fits well on the training set. Which of the following can be improved? Select all the correct items

A. Try to add A polynomial feature

B. Obtain more training examples

C. Try to use fewer features

D. Use fewer training examples

Question 48

Suppose you’ve implemented regularized logic to predict what customers will buy on a shopping site. However, when you test your model on a new set of customers, you find that it has a large error in prediction. In addition, the model performs poorly on the training set. Which of the following can be improved? Select all the correct items

A. Try to capture and use other features

B. Try to add polynomial features

C. Try to use fewer features

D. Try to add regularization parameters

Question 49

Which of the following statements is true? Select all the correct items

A. Suppose you are training A regularized linear regression model. The recommended method for selecting regularization parameter values is to select the value with the minimum cross-validation error.

B. Suppose you are training a regularized linear regression model. The recommended method for selecting regularization parameter values is to select values that give the minimum test set error.

C. Suppose you are training a regularized linear regression model. The recommended way to select regularized parameter values is to select values that give the minimum training set error.

D. Learning algorithms generally perform better on training sets than on test sets.

The 50th topic

Which of the following statements is true? Select all the correct items

A. When debugging the learning algorithm, drawing the learning curve is helpful to know whether there is high deviation or high variance.

B. If a learning algorithm is affected by high variance, adding more training instances may improve the test error.

C. We always prefer high-variance models (rather than high-bias models) because they fit better into the training set.

D. If a learning algorithm has a high bias, simply adding more training instances may not significantly improve the test error.

Question 53

Suppose you have trained an output logistic regression classifier. Currently, if, predicts 1, if, predicts 0, and the current threshold is set to 0.5.

Suppose you increase the threshold to 0.9. Which of the following is true? Select all the correct items

A. Classifiers may now be even less accurate.

B. The accuracy and recall rate of the classifier may be constant, but the accuracy is low.

C. The accuracy and recall rate of the classifier may be unchanged, but the accuracy is high.

D. Classifiers may now have lower recall rates.

Suppose you lower the threshold to 0.3. Which of the following is true? Select all the correct items

A. Classifiers may now have higher recall rates.

B. The accuracy and recall rate of the classifier may be unchanged, but the accuracy is higher.

C. Classifiers may now have higher accuracy.

D. The accuracy and recall rate of the classifier may be constant, but the accuracy is low.

54 the topic

Suppose you are using a spam classifier, where spam is the positive example (y=1) and non-spam is the negative example (y=0). You have an E-mail training set where 99% of the E-mail is non-spam and 1% is spam. Which of the following statements is true? Select all the correct items

A. A good classifier should have both high precision and high recall on the cross validation set.

B. If you always predict non-spam (output y=0), your classifier will be 99% accurate on the training set, and its performance may be similar on the cross-validation set.

C. If you always predict non-spam (y=0), your classifier will be 99% accurate.

D. If you always predict non-spam (output y=0), your classifier will be 99% accurate on the training set, but worse on the cross-validation set because it overfits the training data.

Question 55

Which of the following statements is true? Select all the correct items

A. It is A good idea to spend A lot of time gathering A lot of data before building the first version of A learning algorithm.

B. Accuracy is not a good performance measure on skewed data sets (for example, when there are more positive examples than negative ones), and you should use F1 scores for accuracy and recall.

C. After training the logistic regression classifier, 0.5 must be used as a prediction example to be the threshold of positive and negative.

D. Using a very large training set makes the model less likely to overfit the training data.

E. If your model does not fit the training set, it may be helpful to get more data.

56 the topic

Suppose you use a support vector machine trained with a Gaussian kernel that learns the following decision boundaries on the training set:

You think the support vector machine is underfitting, should you try adding or subtracting? Or increase or decrease?

[A]

[B]

[C]

[D]

Question 58

Support vector machine solution, where the functions and images are as follows:

The first item in the goal is zero if two of the following four conditions are true. What are the two conditions that make this term equal to zero?

A. For each example of, there are

For each example of, there are

C. For each example of, there are

For each example of, there are

Question 59

Suppose you have a dataset with N =10 characteristics and M =5000 examples. After training the logistic regression classifier with gradient descent, you find that it does not fit the training set well and does not achieve the desired performance on the training set or cross-validation set. Which of the following steps is expected to improve? Select all the correct items

A. Try to use A neural network with A large number of hidden units.

B. Reduce the number of examples in the training set.

C. Use different optimization methods because using gradient descent training logic may result in local minima.

D. Create/add new polynomial features.

60 questions

Which of the following statements is true? Select all the correct items

A. Suppose you use support vector machines to classify multiple classes and want to use A “one-for-all” approach. If you have a different class, you will train a different support vector machine.

B. If the data is linearly separable, then the linear kernel support vector machine will return the same parameter regardless of the value (i.e., the resulting value of is independent of).

C. The maximum value of a Gaussian kernel is 1.

D. It is important to normalize features before using gaussian kernels.

The 63th question

K-means is an iterative algorithm that repeats the following two steps in its internal loop. Which two?

A. Move the cluster center and update it.

B. Allocate clusters where parameters are updated.

C. Move the cluster center and set it to equal to the latest training example

D. Cluster center allocation step, where each cluster center of mass is assigned (by setting) to the most recent training example.

The 64th question

Suppose you have an unlabeled data set. You run k-means initialization with 50 different random numbers and get 50 different clusters. What is the way to choose which of these 50 combinations?

A. The only way is we need data labels.

B. For each category, calculate and select the one with the lowest value.

C. the answer is ambiguous and there is no good choice.

D. always choose the last (50th) cluster found because it is more likely to converge to a good solution.

The 65th question

Which of the following statements is true? Select all the correct items

A. If we are worried about k-means falling into local optimal solutions, one way to improve (reduce) the problem is to try to use multiple random initializations.

B. The standard way to initialize the k-mean is to set the vector equal to zero.

C. As K-means is an unsupervised learning algorithm, it cannot over-fit the data, so it is better to cluster as much as possible in calculation.

D. For some data sets, the “correct” value for K (number of clusters) may be ambiguous and difficult to decide even for human experts who scrutinize the data.

E. The k-means will give the same result regardless of the initialization of the cluster center.

F. A good way to initialize K-means is to select K (different) examples from the training set and set cluster centroids equal to those selected examples.

G. In each iteration of the k-mean, the cost function (distortion function) either stays the same or decreases, and in particular should not increase.

H. Once an example is assigned to a particular cluster center, it will never be reassigned to a different cluster center.

The 67th question

Which of the following is a reasonable way to choose the number of principal components? (n is the dimension of input data and mm is the number of input examples)

A. Select the minimum value of k that preserves at least 99% of the variance

B. Select K to make the approximation error.

C. Select the minimum value of k that preserves at least 1% of the variance

D. Select n whose K is 99% (that is, round to the nearest integer).

The 68th question

Suppose someone tells you that the way they run principal component analysis is that “95% of the variance is preserved.” What’s the equivalent?

A.  

B.  

C.  

D. 

The 69th question

Which of the following statements is true? Select all correct items

A. There is no way to reconstruct any reasonable approximation by simply giving the sum.

B. Even though all the input features are on a very similar scale, we should still perform mean normalization (such that the mean of each feature is zero) before running PCA.

C. PCA is susceptible to local optimal solutions; It may be helpful to try random initialization several times.

D. Given input data, it makes sense to run PCA only with satisfying k values (in particular, it is possible to run PCA with k=n, but not helpful and meaningless)

The 70th question

Which of the following is a recommended application of PCA? Select all correct items

A. As an alternative to linear regression: PCA and linear regression give basically similar results for most model applications.

B. Data compression: Reduces the data dimension to reduce the memory/disk space occupied.

C. Data visualization: obtain 2d data and find different rendering methods in 2d (k=2).

D. Data compression: Reduce the dimension of input data that will be used for supervised learning algorithms (i.e., PCA is used to make supervised learning algorithms run faster).

The 72th question

Suppose you have trained an exception detection system to flag exceptions when, and you find in the cross-validation set that it has too many false positives (flags too many things as exceptions). What should you do?

[A]. [B]

The 73th question

Suppose you are developing an anomaly detection system to catch manufacturing defects in an aircraft engine. Your model. There are two properties = vibration intensity, = heat produced, both of which have values between 0 and 1 (and strictly greater than 0). For most “normal” engines, you would expect. One of the suspicious anomalies is that the engine vibrates violently (large, small) even if it does not generate much heat, even though specific values of and may be outside its typical range. What characteristics should you construct to catch these types of exceptions:

A.   B.   C.   D. 

The 74th question

Which of the following is true? Select all correct items

A. If there is no labeled data at all (or if all data is labeled), you can still learn, but it may be more difficult to evaluate the system or choose A good value.

B. If you have a training set with many positive examples and many negative examples, then an outlier detection algorithm may perform as well as a supervised learning algorithm such as a support vector machine.

C. If you are developing an exception detection system, you cannot use marked data to improve your system.

D. When selecting features for an outlier detection system, it is best to look for features with outlier large or small values for the outlier example.

The 75th question

You have a one-dimensional data set and want to detect outliers in the data set. First draw the data set, which looks like this:

Assume that the Gaussian distribution parameters μ1μ1 and σ21σ12 are fitted to this data set. For, which of the following values can be obtained?

A.   

B.   

C.   

D. 

The 76th question \

Let’s say you own a bookstore and rate books (1 to 5 stars). The collaborative filtering algorithm learns the parameter vector for user J and the feature vector for each book. You need to calculate the “training error,” which is the average square error of your system’s predictions for all the ratings you get from users. Which of the following is true (select all correct items)? For this problem, let m be the total number of ratings you get from users (.

A.  

B.  

C.  

D. 

The 77th question

In which of the following situations is a collaborative filtering system the most appropriate learning algorithm (as opposed to linear or logistic regression)?

A. You run an online bookstore and collect reviews from many users. You want to use it to identify which books are “similar” to each other (i.e., if a user likes a book, which books are they likely to like?).

B. You manage an online bookstore and you have reviews from many users. You want to predict expected sales (the number of books sold) based on the average score of a book.

C. You’re an artist, drawing portraits of your clients by hand. Each client gets a different portrait (themselves) and gives you 1-5 star feedback, with a maximum purchase of 1 portrait per client. You want to predict what your next client’s rating will be.

D. You own a clothing store that sells many styles and brands of jeans. You’ve collected reviews from regular shoppers about different styles and brands, and you want to use those reviews to offer those shoppers discounts on the jeans you think they’re most likely to buy

The 78th question

You run a movie company and want to build a movie recommendation system based on collaborative filtering. There are three popular review sites (we’ll call them A, B and C) where users can go and rate movies. You’ve just acquired three companies that operate these sites and want to aggregate data from all three companies to build a single/unified system. On site A, users rate A movie between one and five stars. On site B, users are ranked 1-10, allowing decimals (e.g. 7.5). On site C, ratings range from 1 to 100. You also have enough information to identify users/movies on one site from users/movies on another site. Which of the following statements is true?

A. You can combine three datasets into one dataset, but you should first normalize the rating of each dataset (such as recalibrating the rating of each dataset to A range of 0-1).

B. All three training sets can be combined into one as long as average normalization and feature scaling are performed after merging the data.

C. Assuming that at least one movie/user in one database does not appear in the second database, there is no reasonable way to merge these datasets because of the lack of data.

D. Data from these sites cannot be combined. You have to set up three separate referral systems.

The 79th question

Which of the following is the right choice for a collaborative filtering system? Select all the correct items

A. The cost function of content-based recommendation algorithm is. Suppose there is only one user who has rated every movie in the training set. That means that for each of them, there’s a sum. In this case, the cost function is equivalent to the function used to regularize linear regression.

B. All parameters () can be initialized to zero when gradient descent is used to train the collaborative filtering system.

C. If you have a user rating data set for certain products, you can use this data to predict his preference for products that are not rated.

D. To use collaborative filtering, you need to manually design a feature vector for each item in the dataset (for example, a movie) that describes the most important attributes of that item.

The 80th question

Let’s say I have two matrices, 5×3, which is 3×5. Their product is a 5×5 matrix. In addition, there is a 5×5 matrix R, where each entry is either a 0 or a 1. You want to find the sum of all the elements, which corresponds to 1, ignore all the elements. One way is to use the following code:

Which of the following also correctly calculates this total? Select all the correct items

A. total = sum(sum((A * B) .* R))

B. C = A * B; total = sum(sum(C(R == 1)));

C. C = (A * B) * R; total = sum(C(:));

D. total = sum(sum(A(R == 1) * B(R == 1));

The 81th question

Suppose you are training a logistic regression classifier using stochastic gradient descent. You find that in the last 500 examples, the cost (that is, averaged over 500 examples) is plotted as a function of the number of iterations, slowly increasing over time. Which of the following changes might help?

A. Try to average costs with fewer examples in the diagram (say 250 examples instead of 500).

B. This is not possible in the case of stochastic gradient descent because it guarantees convergence to the optimal parameters.

C. Try to halve (reduce) the learning rate to see if this leads to a sustained cost reduction; If not, continue to halve until costs continue to fall.

D. Take fewer examples from the training set

The 82th question

Which of the following statements about stochastic gradient descent is true? Select all the correct items

A. You can use numerical gradient checking to verify that your implementation of the STOCHASTIC gradient descent is correct (one step of the stochastic gradient descent is to compute the partial derivative)

B. You should shuffle (reorder) the training set randomly before running the STOCHASTIC gradient descent.

C. Suppose you use stochastic gradient descent to train a linear regression classifier. The cost function must decrease with each iteration.

D. To ensure that the STOCHASTIC gradient descent converges, we usually calculate it after each iteration and plot it to ensure that the cost function is decreasing overall.

The 83th question

Which of the following statements about online learning is true? Select all the correct items

A. If we have A continuous/uninterrupted stream of data, an online learning algorithm is usually the best fit.

B. When we have a fixed training set of size M to be trained, the online learning algorithm is most suitable.

C. With online learning, you must save every training example you get, because you will need to reuse past examples to retrain the model, even after you get new training examples in the future.

D. An advantage of online learning is that if the functionality we are modeling changes over time (for example, if we are modeling the probability of a user clicking on a different URL and the user’s taste/preference changes over time), the online learning algorithm will automatically adapt to those changes.

The 84th question

Assuming you have a very large training set, which of the following algorithms do you think can be parallelized using Map-reduce and splitting the training set across different machines? Select all the correct items

A. Train logistic regression with stochastic gradient descent

B. Train linear regression with stochastic gradient descent

C. Train logistic regression with batch gradient descent

D. Calculate the average of all features in the training set (for example, to perform average normalization).

The 85th question

Which of the following statements about map-reduce are true? Select all the correct items

A. Due to network latency and other costs associated with Map-Reduce, if we run Map-Reduce on N machines, we may get less than N times more speed than if we run Map-Reduce on one machine.

B. If you only have a computer with a single computing core, map-Reduce is unlikely to be helpful.

C. When using map-Reduce with gradient descent, we typically use one machine to accumulate gradients from each Map-Reduce machine in order to calculate parameter updates for that iteration.

D. Linear regression and logistic regression can be parallelized by Map-reduce, but neural network training cannot.

The 88th question

What are the benefits of upper bound analysis? Select all the correct items

A. This is A way to provide additional training data for the algorithm.

B. Upper bound analysis can help us analyze which part of the assembly line improves the whole system the most.

C. Using upper limit analysis can let us know whether a certain module needs to be done well or not; Because even if the accuracy of this module is improved to 100%, it will not improve the accuracy of the whole system.

D. The upper bound analysis will not help us analyze which part is high bias and which part is high variance.

The 89th question

Suppose you are building an object classifier that takes an image as input and recognizes it as containing a car (y=1y=1) or not (y=0y=0). For example, here is a positive example and a negative example:

After carefully analyzing the performance of the algorithm, you conclude that you need more positive examples (). Which of the following might be a good way to get more positive examples?

A. Apply translation, distortion and rotation to the images in the existing training set.

B. Select two car images and average them to generate a third example.

C. Take some images from the training set and add random Gaussian noise to each pixel.

D. Make two copies of each image in the training set; This immediately doubles the training set size.

The 90th question

Suppose you have a picture handwriting character recognition system with the following pipeline:

You have decided to perform an upper bound analysis on this system and get the following: \

Which of the following statements is true? \

A. It is possible to improve the performance of character recognition systems.

B. To perform the upper bound analysis here, we need to label the other three processes to judge the ground-truth.

C. The least promising part is the character recognition system, which has achieved 100% accuracy.

D. The most promising component is the text detection system, as it has the lowest performance (72%) and therefore the greatest potential gain.

Note: the menu of the official account includes an AI cheat sheet, which is very suitable for learning on the commute.

Highlights from the past2019Machine learning Online Manual Deep Learning online Manual AI Basic Download (Part I) note: To join our wechat group or QQ group, please reply "add group" to join knowledge planet (4500+ user ID:92416895), please reply to knowledge PlanetCopy the code

Like articles, click Looking at the