360 o&M development team, as a technology-oriented team, has accumulated rich experience for HULK cloud platform in containerization, microservices, AIOPS, automated o&M and other fields. For more technical articles, visit the team’s technical blog: www.opsdev.cn.
background
In a private cloud environment, computing and storage resources used by service lines may age or migrate and need to be returned. But as the number of business and the number of machines increased, so did the number of machines returned. However, it is not possible to return all the machines in the business. Effective machine return can not only save costs, but also improve the stability and efficiency of the business. Therefore, an algorithm needs to be designed to determine whether the machine needs to be returned.
It is easy to think of this problem as a machine sorting problem, where a classifier is designed to determine whether the machine needs to be returned. In fact, this problem can be solved not only by classifying, but also by designing a recommendation system for the user (i.e. the business) to recommend the machine that should be returned most.
Facing the challenge
When designing a recommendation system for returning machines, consider the business as the user and the machine as the object. However, when analyzing the data, it is found that a machine generally belongs to one business, not another business. In other words, an article belongs to one user, and it can only be obtained by one user, but not by other users. In addition, a business owned machine is specific and can only recommend the business existing machine as a return machine.
The above two problems make the recommendation of returning machine very different from the traditional recommendation of music, movies or goods, so it cannot be solved by using the existing algorithm directly, and it is necessary to construct shared goods and recommend them by ourselves.
Considering the above problems, this paper proposes a recommendation method based on K-means clustering and ItemCF algorithm, so as to solve the recommendation problem of returning machines.
Methods to study
This chapter is divided into two parts: K-means clustering and ItemCF algorithm, which will be introduced in detail below:
K – Means clustering
Because a machine generally belongs to one business and not another, the machine is unique to the business rather than a shared good. Therefore, cluster analysis can be used to divide existing machines into several categories, and these categories of machines are shared by all users.
When making recommendations, the machine category should be recommended first, and each category has a recommendation score. The higher the score, the more likely the returned machine will appear in this category. Secondly, the recommendation is made for the machines owned by the target business. At this time, the average shape similarity between each existing machine and the machine returned by the business is calculated. The higher the average similarity, the easier it is to return the machine. By integrating the above two steps, the final recommendation score of each machine can be obtained, and then the specific recommendation machine of the target business can be obtained by sorting according to the score.
In this paper, for the clustering of machines, the selection is: CPU idle rate and other five features. K-means clustering has an existing algorithm package in Python, which only needs to be called, as follows:
Through experiments in this paper, it is found that the selection of K value in clustering has little influence on the final result, but through analysis, it is known that the value of K should be moderate, too large or too small is not good. If it is too small, it cannot be classified and the recommendation error is relatively large. If it is too large, there will be many categories of machines in some businesses without machines. Taking the above conditions into consideration, the final K value is 4.
ItemCF algorithm
ItemCF algorithm is one of the classic collaborative filtering algorithms, which recommends items to users based on the similarity of items. Two hypotheses were mainly considered in the recommendation:
-
Users with similar interests might be interested in the same things;
-
Users may prefer items similar to what they have already purchased.
That is to say, taking into account the historical habits of users, the objects may not be objectively similar, but they can be considered as subjectively similar due to human behaviors, so recommendations can be generated.
Key steps of ItemCF algorithm:
Calculate the similarity between items
Where N(I) is the set of users who like item I.
Generate a list of recommendations for users based on similarity of items and historical behavior of users
Where p_UI is user U’s preference for item I, T(u) is the collection of items that user U used to prefer, S(j,k) is the collection of k items most similar to item J, and R_UI is user U’s interest in item I.
For the implicit feedback data set, r_UI =1 if user U has any behavior on item I. In this recommendation, it is the number of machines that business U has returned to machine class I. The ItemCF algorithm does not have an algorithm package in Python and needs to be implemented by itself.
conclusion
In this paper, k-means clustering and ItemCF algorithm are used to recommend returning machines. Now the recommendation accuracy is about 79%, but at least two sub-businesses are required in the main business. However, many businesses do not meet the conditions in actual operation, so the recommendation cannot be made. Moreover, the recommendation accuracy is not high enough, which can also make the algorithm rough, or the feature selection of the machine is not good enough, which needs further optimization.