Zhou Zhihua’s “machine learning” (watermelon book) did not publish the answer, so we collected the answer to each topic from the Internet for your reference.

Reference answer Chapter 1 introduction

1.1. If table 1.1 contains only two samples numbered 1 and 4, try to give the corresponding version space.

Table 1.1 Watermelon data set (simplified)

Serial number Colour and lustre roots Knock sound Good melon
1 green Curled up Turbidity ring is
2 black Slightly curled dull no

CSDN blogger “Four to six into one”

Original link: blog.csdn.net/icefire_tyh…

A:

Hypothesis space refers to the space composed of all the hypotheses in the question. We can regard the learning process as the process of searching in the hypothesis space, and the search goal is to find the hypothesis “matching” with the training set.

Suppose the data set has one property, the possible value of the first property is one, plus the generalization value of that property, so the possible hypothesis is one. And then the empty set means that there are no positive examples, assuming that there are all kinds of hypotheses in the space. Realistic problems often face a large hypothesis space, we can find a hypothesis set consistent with the training set, called version space. The version space removes the assumptions that are inconsistent with positive examples and consistent with negative examples from the hypothesis space, which can be regarded as the maximum generalization of positive examples. The version space can be obtained by searching the hypothesis space, which requires traversing the entire hypothesis space. If there are positive examples in the data set, the maximum generalization of a positive example can be carried out first to obtain a hypothesis, and then these hypotheses can be eliminated to simplify the calculation appropriately. The data set has 3 attributes, and each attribute has 2 values. There are a total of assumptions, respectively

  • 1. Color = green root = curled knock = turbid ring

  • 2. Color = green root = crouching tapping = dull

  • 3. Color = green root = slightly curled = loud

  • 4. Color = green root = slightly curled = dull

  • 5. Color = black root = curled knock = turbid ring

  • 6. Color = black root = crouching tapping = dull

  • 7. Color = black root = slightly curled = loud

  • 8. Color = jetty root = slightly curled = dull

  • 9. Color = green root = curled tapping =*

  • 10. Color = green root = slightly curling sound =*

  • 11. Color = Black root = curled tapping =*

  • 12. Color = black root = slightly curling sound =*

  • 13. Color = green root =* Tap = turbid ring

  • 14. Color = green root =* Tapping = dull

  • 15. Color = black root =* Tapping = loud

  • 16. Color = black root =* Tapping = dull

  • 17. Color =* Root = curled knock = turbidness

  • 18. Color =* Roots = curled tapping = dull

  • 19. Color =* Root = slightly curling = loud

  • 20. Color =* Roots = slightly curled = dull

  • 21. Color = green root =* Tapping =*

  • 22. Color = black root =* Tapping =*

  • 23. Color =* Root = curled tapping =*

  • 24. Color =* Root = slightly curling =*

  • 25. Color =* Root =* Tapping = loud

  • 26. Color =* Root =* tapping = dull

  • 27. Color =* Root =* tapping =*

  • 28. Null set ø data numbered 1 can be deleted (excluding data). Data numbered 1 can be deleted (including data).

  • 1. Color = green root = curled knock = turbid ring

  • 9. Color = green root = curled tapping =*

  • 13. Color = green root =* Tap = turbid ring

  • 17. Color =* Root = curled knock = turbidness

  • 21. Color = green root =* Tapping =*

  • 23. Color =* Root = curled tapping =*

  • 25. Color =* Root =* Tapping = loud

    In general, version space is a generalization of positive examples, but because there is only one positive example in the dataset, the hypothesis of this sample is still included in version space (hypothesis 1).


1.2. Compared with using a single conjunction for hypothesis representation, the use of “disjunctive normal form” will make the hypothesis space have stronger representation ability. Such as:

The “color = (green) ^ (roots = curled up) ^ (knock sound = crisp” and “[color = black) ^ (roots = proofed) ^ (knock sound = depressing)” are classified as “good melon”, if use at most k a conjunction type analysis and pattern that express 1.1 watermelon hypothesis space classification problems, try to estimate how many possible hypothesis.

CSDN blogger “Weixin_41587767”

Original link: blog.csdn.net/weixin_4158…

A:

Table 1.1 contains 4 samples with 3 attributes, assuming that 3∗4∗4+1=493∗4∗4+1=49 hypotheses in the space. It contains at most k conjunctions to express the hypothesis space, and of course the maximum value of k is 49.

Leaving out the empty set, there are 48 possibilities:

2∗3∗3=182∗3∗3=18 hypotheses

One attribute generalization: 2∗3+3∗3+2∗3=212∗3+3∗3+2∗3=21 hypotheses

Two attribute generalizations: 2+3+3=82+3+3=8 hypotheses

Three attribute generalization: 1 hypothesis

The permutations and combinations of these 48 hypotheses are used to form the disjunctive normal form, and the sequence of expansion is (that is, a row of Yang Hui triangle) :

1, 48, 1128, 17296,… 17296, 1128, 48 and 1 are 49 numbers in total. 1 on the left represents’ empty ‘and 1 on the right represents’ all’.

If k=48, it means that at most 48 conjunctive forms can be used to form the disjunctive normal form, excluding none of them, which is 2^ 48-1. 2^48 is based on the binomial theorem.

If 0<k<48, add up all the first k+1 entries in the expanded sequence (since the expanded sequence starts at 0) and subtract 1

If the number of k is specified, it is the number of the k+1 term of the expansion sequence (because the expansion sequence starts at 0)

However, this result has to be repeated, because a generalization is an inclusion of several hypotheses, not itself a hypothesis. When I expand out the * of the generalization,

It’s a number of specific assumptions. If this problem takes 48, then by expanding *, there must be a repetition in the hypothesis set, and a particular hypothesis must be repeated more than once.

The problem should be calculated using 18 specific assumptions, namely: 2^ 18-1

The following Python code does not take into account the all-empty case (-1) and does not take into account de-weighting.

# -*- coding: utf- 8 --* -def strige(Max):1]
    while max: 
        N = S[:]
        N.append(0)
        S = [N[i- 1]+N[i] for i in range(len(N))]
        max -= 1
    return S 

def cal_Permutations(total_num =0,select_num= 0,most_num = 0):
    re_total_num = 0
    re_select_num = 0
    re_most_num = 0
    if total_num == 0:
        raise ValueError,'pls indicate total numbers'
        return

    if select_num>total_num or most_num>total_num:
        raise ValueError,'select_num or most_num can not bigger than total_num'
        return

    s = strige(total_num)

    for x in s:
        re_total_num += x

    re_select_num = s[select_num]

    for y in range(0,most_num+1):
        re_most_num += s[y]

    return {'input_parameter': {'total_num':total_num,'most_num':most_num,'select_num':select_num},'output_permutations':s,'output_usual_count': {'total_num':re_total_num,'most_num':re_most_num,'select_num':re_select_num}}

result = cal_Permutations(48.40.48)
print (result)
Copy the code

1.3. If the data contains noise, there may be no hypothesis consistent with all training samples in the hypothesis space. In this case, try to design an inductive preference for hypothesis selection.

CSDN blogger “Four to six into one”

Original link: blog.csdn.net/icefire_tyh…

A:

It is generally assumed that the more similar the attributes of two data are, the more inclined they are to be grouped into the same category. If there are two different classifications of the same attribute, it is considered to belong to the attribute of the nearest data. It can also be considered to remove all the data with the same attribute but different classification at the same time. The data left is the data with no error, but some information may be lost.


4. When discussing the “No free lunch” theorem in Section 1.4 of this chapter, “classification error rate” is used by default as a performance measure to evaluate classifiers. If other performance measures are used, equation (1.1) will be changed to:

.

Prove that there is no free lunch.

CSDN blogger “Four to six into one”

Original link: blog.csdn.net/icefire_tyh…

A:

Again, considering the dichotomy problem, NFL first has to make sure that the objective function is evenly distributed, and for the dichotomy problem with a sample, obviously f has all kinds of cases. Half of them are consistent with the hypothesis. In this case,, should be a constant, and the implied condition should be (a reasonably sufficient condition). If not, the NFL should be untenable (or not so easy to prove).


5. Describe where machine learning plays a role in Internet search.

CSDN blogger “Four to six into one”

Original link: blog.csdn.net/icefire_tyh…

A:

1. The most common, message push, for example, someone often says something I might be interested in, but there isn’t.

2. Ranking of website relevance, comprehensive analysis based on clicks and web content.

3. Image search is still mostly based on tags, but there will always be pixel-based search.

Machine learning online manual for Machine learning Deep Learning Notes album for AI Basics download (PDF update25Set) Machine learning basics of Mathematics album get a discount website knowledge planet coupon, copy the link directly open: HTTPS://t.zsxq.com/yFQV7am site QQ group 1003271085, join wechat group please scan code like articles, click a look
Copy the code