Someone joked with me that this is essentially a big database operation + algorithm operation…. It has nothing to do with C++ except if you want to make a GUI interface…. Our group discussion finally decided to give up the challenge themselves, yielding to the algorithm and the final exam…… Project Description In this project, your team needs to develop a review and recommendation system for certain products, similar to Amazon, Dangdang and Douban. This category of products is not limited to books, can be movies, music, restaurants, food and so on. For the convenience of expression, we uniformly describe it as books in this document. You need to manage a group of users and many books. Users can [review and rate] books in the system (assuming a scale of 1-10 is used). Of course, it’s not possible for a user to rate all the books in the system, but only review and rate the books they’ve read. System operation, you can register new users, add new books, can also delete or modify the information of existing books and users (in addition to their unique identifier information, such as book author, year and so on). For each book, the system counts its [total number of reviews and average score], etc. After each user logs in, the system recommends 10 books that the user is most likely to read but has not yet read. In addition to recommending books, the system can also recommend potential friends with similar interests to users, that is, other users with similar interests. Reference data set Book – Crossing the Dataset: http://www2.informatik.uni-freiburg.de/~cziegler/BX/ MovieLens: http://grouplens.org/datasets/movielens/

[is it very like database + algorithm big homework?? :] here I first remember a bit of problem-solving ideas

In essence, it is to establish a relationship between users and recommendations, users and users, and items and items respectively, which requires a little knowledge of graphs in data structures. So the big question is, how do you tell if there’s a good connection between two nodes? How to sort? Note that there are a number of parameters, user information and ratings are important.

[Possible recommendation algorithms involved] 1. Demographic-based recommendation:

To put it more simply, it is to match objects to similar people of the same sex or age group. Advantages: no cold start problem [cold start problem: in the case of no user history data design personalized recommendation and let users satisfied with the result of the recommendation problem. Or fundamentally said: how to recommend a new user of the problem] weakness: it is obvious that algorithm is very rough, difficult to satisfactory effect, is only suitable for simple recommendations.

2. Content-based recommendations

Advantages: It can well model user interest and get better recommendation by increasing the attribute dimension of items.

Disadvantages: there are cold start problems, and the property of the item is limited, the measurement standard of similarity is uncertain, may be more one-sided.

3. The collaborative filtering (CF) algorithm “big BOSS” few words said reference links on http://blog.csdn.net/ygrx/article/details/15501679 first http://www.ibm.com/developerworks/cn/web/1103_zhaoct_recommstudy2/index.html, I think the two writing will be much better than I the BOSS has two variants (1) the user-based collaborative filtering ideas: Calculate the similarity degree among all users according to the user’s evaluation of the item; Select N users that are most similar to the current user; Using the ratings of N neighbor users on items, the possible ratings of current users on items that have not been browsed are predicted. Recommend items to current users based on predicted possible ratings.

In fact, this can be understood as the problem of judging the length of the path between the two nodes in the graph.

Similarity: A common calculation method: Pearson Correlation

2. A commonly used prediction function:

For each neighbor user, calculate the difference between his rating of item P above or below his average rating and combine these scoring differences — add neighbor users’ rating bias on P to user A’s average rating with the weight of user similarity as user A’s rating prediction of item P

(2) Collaborative filtering based on items calculates the similarity of all items according to users’ evaluation of items; Use a user’s ratings of similar items to predict the user’s ratings of items they have not viewed. Recommend items to current users based on predicted possible ratings.

The problem is the same as above. In fact, the solution method is similar, but it may need a little adjustment.

The advantages of collaborative filtering: Obviously, it is more fineness and more scientific than the previous one. Disadvantages: score matrix sparse problem, cold start problem, etc.

Small adjustments: — (1) a graph-based approach: (spreading activation) Idea: recommend items with a path of length >3 — (2) Matrix SVD decomposition: singular value decomposition, which is essentially dimension reduction — (3) Information pretreatment: noise reduction and normalization

Noise reduction: The user behavior data is generated by the user in the process of using the application, which may have a lot of noise and user misoperation. We can filter out the noise in the behavior data through the classical data mining algorithm, so that our analysis can be more accurate. Normalization: It may be necessary to weight different behavioral data when calculating how much users like an item. However, it can be imagined that the data values of different behaviors may vary greatly. For example, the data viewed by users is bound to be much larger than the data purchased. Therefore, how to unify the data of each behavior in a same value range, so as to make the overall preference obtained by weighted sum more accurate, requires normalization processing. The simplest normalization process is to divide each type of data by the maximum value in this class to ensure that the normalized value is in the range [0,1]

Specific code can refer to my lot content: https://github.com/rucerlx/coding-language-learning/pull/1