This article was originally published by AI Frontier.
The online purchase rate is up to 60%. How does Amazon’s recommendation system achieve this?
The author | Yuan Yuan
Edit | Vincent
According to a Fortune report, Amazon’s sales are growing fast thanks to its integration of systems into purchases, from product discovery to payment. Amazon’s online recommendation system has a purchase conversion rate of 60%, according to Estimates by Wall Street analysts. So what is the secret of Amazon’s recommendation system success? Let’s reveal it to you.
Statement | in this paper to sort out the AI the front to share live on March 1, 2018, shall not be reproduced without permission!
Everybody is good! I am very glad to have the opportunity to share personalized intelligent recommendation system with you today.
Let me introduce myself first. My name is Yuan Yuan. I got my PhD in 2011 and have worked in Amazon for two and a half years. I have some experience to share with you in these jobs, so I take this opportunity to talk about it with you today.
If you want to know more about the specific content, you can scan our TWO-DIMENSIONAL code and come to our greedy technology online course. In it, I will explain each algorithm we talked about today in the most detailed language and the most clear and methodical way. And for each algorithm, I write code in Python and share it with you, so you can not only theoretically learn an algorithm, you can understand it, but you can also see how it works. When I implement the algorithm, I try not to use any third-party libraries. I try not to use those libraries that are already packaged. Maybe many ready-made ai libraries are written in C++ and may use GPU for efficiency. I’ll write it in pure Python so that you can see in great detail how each step is implemented.
If you are interested, if you know the details, you can continue to scan our QR code and go to our greedy technology website for more details.
Weixin.qq.com/r/eCqPl5XE6… (Qr code automatic recognition)
With me today is our CEO, Wenzhe Li, Wenzhe Now can you say hello to everyone?
Wenzhe Li: Hello, my name is Wenzhe Li, and I am currently the CEO and founder of Greedy Technology. I did my PhD in artificial intelligence in the United States. I was at USC. I have also served as chief scientist in many domestic enterprises, so I am quite familiar with the whole AI field.
I create such a company is that I wish to take this knowledge popularization of AI technology and AI, to the domestic many people can go to learn, and be used up in actual combat, and today we’re pleased to invite to Jerry teachers to share recommendation system such a topic, I hope you learn some knowledge in today’s curriculum. If you have any questions, we can discuss them together or ask me questions. After the class, we can interact in the forum again. So let’s give it to Mr. Jerry.
Yuan Yuan: Let’s start our sharing now.
My main content today is as follows:
First of all, I will give you an overview of the recommendation system, what it represents, what specific recommendation system implementation we can use for reference;
Second, I will recommend common recommendation algorithms to you. If you are trying to build a system yourself, you can start with these most common algorithms.
And then I want to highlight something that maybe people don’t pay much attention to, which is how do you evaluate a recommendation system? Only when you establish a good evaluation standard, can you constantly improve your algorithm, so that your system runs better and better;
Finally, I’ll look at some of the recommendation systems that exist today and how they are generally architected, and how to describe it in terms of software systems.
Introduction to recommendation System
First of all, what is a recommendation system?
We need to define what we’re talking about today.
In fact, recommendation system is a kind of information processing system, used to predict whether a user likes something very much, if so, how much, whether a user likes something very much, just like it, or not at all. There are certainly many fields in which the recommendation system can be used. Let me give some examples: Today’s Toutiao recommends personalized news to users; Youku Tudou YouTube recommends videos to users; Xiami Music recommends music users like, while Taobao and JD.com recommend books, food, clothes and a variety of other goods. In addition, social networks such as Twitter, Facebook and Sina Weibo recommend friends to users.
Of course, the way of recommendation is also achieved by the recommendation system. Now finance is very popular, if you are doing financial products, you must want to do P2P finance to recommend stocks, fund financing, fund securities and financial products to your customers. Of course, singles should not forget that various dating websites in the dating market also recommend suitable partners to their single friends, using the same recommendation system.
Therefore, we can see that the recommendation system has become an indispensable part of many websites and mobile apps. These apps and websites rely on recommendation systems to drive sales, attract users’ attention, increase user activity and attract new users.
Because I used to work at Amazon, let’s look at a specific company that uses recommendation systems.
I’m showing amazon’s share price in Google Felines, and you can see it’s been rocketing up lately. From about $300 a share in 2015, it now costs $1,500.
So what is it that makes it so successful? Let’s take a look at how fortune magazine values it: it says Amazon’s success depends on how it puts its recommendation system into the whole process from product discovery to product purchase.
Fortune also notes that Amazon’s recommendation system has a 60% purchase and conversion rate after making recommendations.
Let’s take a look at what recommendations Amazon uses.
If you log in to the amazon website, first you will see that it will be recommended according to different product categories, such as the upper left corner, it is recommended for taomas this user currently logged in, recommend his fitness equipment, the lower left corner and recommend coffee and tea, there are other books, this is the first: recommend everyone according to category.
The second is frequently purchased goods.
If a user buys a fitness stick from a training organization on the left, it will recommend you to buy the orange fitness ball on the right. Why? The two things according to their history is often people buy together, this is a classic case, you may know the inside of the MBA a classic case: the researchers found that the supermarket should put baby diapers and beer in neighboring place, because a lot of his father to buy baby diapers, see next to have a beer at the same time, can buy together, it is also a kind of strategy.
Third, the way Amazon recommends recommendations is based on your recent history.
What have you seen recently? I recently looked at fitness sticks, so I recommend a bunch of fitness related items.
What’s number four? Instead of recommending something you haven’t seen before, tell you what you saw today, what you saw the other day, what you saw on that particular day, and show you something you’ve seen before, but didn’t order, and maybe you’ll feel the urge to order.
What is the fifth? I also recommend you according to your browsing history, not according to your current browsing history. For example, what I have recently seen is fitness equipment, but the second product recommended to me is sardines, which he recommended to me based on my long history.
There is also a sixth kind of recommendation, which is based on other users who have the same shopping habits as you, for example, he and I bought the same product, what he bought next, based on the recommendation of other users who bought the same product as me.
Seventh, the system knows my purchase history, knows that I once bought a Kindle, and then tells me, now that there is a new version of Kindle, will I want to buy it? This is to recommend a new version of the same product based on the user’s purchase history.
And eighth, it recommends peripheral products based on my buying history. The system knew I bought a Kindle, and it recommended me to ask if you want other accessories, like protective cases.
Ninth, it has nothing to do with my personal buying history and my personal browsing history, but recommends the products that sell better on Amazon.
In addition to going to the website, the system will give you recommendations, it will actively recommend. In the United States, most people use email to log in, so Amazon knows the emails of American users and will send emails to users for recommendation.
Take this example: I recently looked at Canon’s digital camera, and it sent me an email recommending one of Canon’s better cameras.
If I am indifferent, it will make recommendations: Canon may not interest you, but Kodak digital camera to see if you are interested? This is something similar to what I was looking at, but it’s just kodak.
If I was still indifferent, he would send me an email and say, look at the Canon camera, here is a cheap kit, including the camera case, including the memory card, it will be cheaper to buy together, would you be interested? The recommendation system will recommend peripheral products according to the items you have visited and make a kit for users to buy.
Finally, if users are still apathetic, it will give you recommendations, such as direct information about the best-selling digital cameras. In addition to the Canon I used to watch on my own, there are SONY’s, and this is what it recommended to me.
I put a number on each recommendation algorithm, and there are 13 in total, which means that a simple shopping website uses 13 different recommendation strategies for each product alone.
Let’s take a look, what is the algorithm behind all kinds of recommendations?
If we classify recommendation systems, in fact, we all touch recommendation systems, it’s a very primitive technology.
Early portal websites, such as Sina, Yahoo, and People’s Daily also have recommendation systems, but the contents are manually selected by website editors or newspaper editors for readers. It is also a kind of recommendation, but it is manually generated.
The second type of recommendation is simple aggregation systems. For example, if you go to a KTV, the karaoke panel will have the list of KTV’s latest songs. If you go to a bookstore, there will be a list of best sellers. When you buy a movie ticket, the cat’s eye system will also tell you what the hottest movie is. Including Douban, there will be movie box office ranking.
There are also items in the time order, according to the nature of the time to recommend users, such as new items on the shelves priority recommendation.
These systems make recommendations based on the simplest statistics, and while they say they look simple, they often work.
The last one is our focus, which is a truly personalized recommendation system with thousands of faces.
Either way, you’ll see that it’s not personal, whether it’s artificial recommendation systems, or simple aggregations. Everyone opens the People’s Daily to see the same content, and everyone goes to KTV to see the same ranking. What we are most concerned about now is the recommendation system of thousands of people. For example, when everyone opens the home page of Amazon and everyone opens Netflix, they will see these products, or the recommendation of movies is different. That is what we want to focus on today.
If we abstract a recommendation system mathematically, what should it look like in theory? What are the elements and what are the problems to be solved?
First we use U for the set of all users, we have a bunch of users, and then we use S for the set of all objects, so let’s say I have a bunch of users, I have a bunch of things, maybe movies, maybe music, maybe people, maybe other things. My model of the recommendation system is simply saying: U cross S, R is recommended, and the product is represented here as the Cartesian product. This means: Each user and each item I give it a R, the R value is the recommended value, that is to say every user on an item how much is his liking, there will be a lot of website, let you score is one to five stars to express this way, there are like Facebook with thumb up to express or despise this way, That’s what R stands for, and the recommendation system is basically a bunch of things that users like about a bunch of things.
To take an example that everyone can understand: suppose I have Four people, Xiao Ming, Xiao Li, Han Meimei and Xiao Jing. I have four movies as my belongings: A Chinese Odyssey, Goodbye Mr. Loser, Once upon a Time, and The Swordsman. What is the data that I can master and collect to make a recommendation system?
First of all, the recommendation value records the preference degree of each user for each item in each line. For example: Xiao Li to “Trouble of Charlotte” like degree is 2, to “huangfeihong” like degree is 5.
In this recommendation matrix, you will see that some people are known and some are unknown. For example, what is xiaoming’s preference for A Chinese Odyssey? We don’t know, that’s what we want, and that’s a key problem for recommendation systems: how do we derive these unknown values from the recommendations that we know? For example, how much xiaoming likes a Chinese Odyssey, what other data can we get? Of course we get data from users, and every user has gender and age and other attributes that we can get; There are also the attributes of items. For example, we can easily obtain the category of the movie, whether it is wuxia or literary film or comedy, which is also the attributes of other items in the future.
We have these three matrices, and these are the values that we can use, and then the problem we have to solve is to use these data to derive the values of those positions, for example, how much Xiaoming likes a Chinese Odyssey.
Let me highlight the key issues that recommendation systems need to address:
The first is collecting data. In essence, recommendation system is, of course, an artificial intelligence system. Artificial intelligence system needs Training, and of course it needs Training data, so it is very important to collect data. I think we should pay attention to one thing: collecting data is not done overnight, it needs to be updated frequently, and the data is time-sensitive. For example, a person’s preference for something changes over time. Books and movies that he or she likes in childhood may become unpopular in adulthood. Maybe you liked something different last year. Maybe your interests have changed. For example, if a user has a baby now, he may pay more attention to baby items, such as diapers. Therefore, it is important to collect data, and the data collected needs to be updated frequently.
Secondly, the key problem that recommendation system should solve is of course how to predict those unknown data;
The third is how to establish an evaluation system. Only by establishing an evaluation system can we know whether my recommendation system is good or not. If not, how can we adjust it?
To collect data
One way to collect data is display collection, which directly allows users to rate, or like, or leave comments. One big problem is that many users are unwilling or too lazy to like, comment, or rate. Like myself, like I watch a lot of videos on YouTube, and I never give them a thumbs up or a comment that says, “It’s great” or “it sucks.” I never do, and I’m sure a lot of people do.
Where there is no way to collect data, is there another way to do it? This is more and more hidden collection of data, but how about hidden collection? For example, on a video website, if a user watches a movie and finds that he is watching the same movie again after a period of time, he likes the movie very much. If he’s fast-forwarding through the movie, or jumping out of it, then he doesn’t like it, which is a strategy of hidden collection.
Shopping sites are even simpler. If a user buys something, he likes it, and if he returns it, he doesn’t like it. Implicit collection is increasingly important now, and the data collected implicitly is much higher than the data collected explicitly.
How to forecast
Now that we have the data, how do we predict? What are its key challenges?
We get a recommendation matrix of what a user likes, but in reality, the data is pretty sparse, meaning most users don’t know if they like big things. We only know how much each user might like a small number of items, or that only a small number of users like an item, so he expresses a preference.
And then there’s the cold priming object. What is that? Suppose I have a new item, for example an iPhone11, to be put on my shopping website, and no users give it high marks, because it just came out, no one has used it, of course, there is no feedback, how would you recommend it? How do you know if someone likes an iPhone11 or a huawei P11?
Then there are the new users. If a new user comes in and doesn’t have any interaction, it’s impossible to know what they like or dislike, and that’s a big challenge.
What are the common recommendations? If you are now in charge of the company and making a recommendation system, I suggest you try it one by one in the list I will introduce first, and then use a more Fashion and advanced method.
I’m going to go through them briefly.
The first one is content-based, and here’s a diagram to show you what it means.
Let’s say a person likes this sour beer, and ask the system to find all other beers that have similar flavors to that beer, and if it finds another one, of course it can recommend it to him.
Content-based recommendations seem to be the simplest and easiest thing to implement, and they work pretty well, but the problem with content-based recommendations is that in many cases, it’s hard to get the attributes of an item. If two similar products are recommended based on content and product attributes, how can they be judged to be similar? Not an easy thing to do.
Recommend two news to users, for example, how to determine the two similar news, need to use natural language understanding, the system needs to know this news contains what people, what events, what place, what time, this article inside have the head of state, if the two heads of state, at least to some extent is similar; Or do they both cover things like Beijing, do they both cover the same themes, do they both talk about politics, do they both talk about the military, do they both talk about the economy? In this content-based recommendation method, the key problem is how to obtain the attributes of the item. For text, natural language processing may be used. For an image, you might want to use deep learning image recognition technology to figure out what objects are in the image: a person, a dog, a cat, and if it’s a cat, the two are similar, is it an orange cat? This is the content-based approach.
The second method is collaborative filtering.
I’ll give you the simplest example, but you’ll understand collaborative filtering immediately.
Suppose I have three users and four items, an orange, a strawberry, an apple, and a banana. I know that the third user buys apples. Next, I ask you: of the other three items he doesn’t buy, oranges, strawberries and bananas, which one does the third user probably like the most?
The way to think about it based on collaborative filtering is this: We want to give the third user an item that is similar to an apple that he has already bought. What can be similar to an apple? One way to think about it is, what is the item that gets purchased the most times after the user buys an Apple?
Let’s start with bananas. Are bananas bought at the same time as apples? Yes, the first user, he buys an orange, an apple, and a banana, and we count the banana as one point, because he bought it at the same time as the apple, so he adds one point; So if we look at strawberries, strawberries nobody buys strawberries and they also buy apples, strawberries get 0; So oranges, the first user, he buys both an apple and an orange, and the second user buys both an apple and an orange, so orange gets two points, and its similarity to apple is two. In this way, we find that the similarity between apple and orange is 2, apple and strawberry is 0, and apple and banana is 1. We conclude that orange is the most similar to apple, so we recommend orange to the third user, which is the essence of collaborative filtering.
And then we’ll introduce matrix factorization.
For example, there are four users ABCD, and we have WXYZ and four products. We know how much users like the product, for example, A likes X 4.5. Give us A scoring matrix, how can we predict, for example, how much A likes W?
The best way to decompose the matrix is like this: I want to find two other matrices, called the user matrix and the item matrix, where the number of rows in the user matrix is equal to the number of users in my scoring matrix, and the number of columns in the item matrix is equal to the total number of items. For the columns of the user matrix, it must be equal to the number of rows of the item’s matrix, so what is the value of the number of columns and rows? It can be 2, it can be 3, it can be 10, it can be 100.
What are the properties of the user matrix and the item matrix? We want their properties to look like this: when these two matrices are multiplied, their product must be a matrix, the same rows, the same columns as my scoring matrix; I want the corresponding values in the product of the two of them, it should be the same row, the same column as my scoring matrix, so the corresponding values.
For example, the second value in the first row, I want it to be close to 4.5, close to 4.5 in AX, close to 2.0 in AY, and close to 4.0 in BW. That is to say, I want the final matrix of the product of these two matrices to be similar to the values I already know in my scoring matrix.
If you can do that, then the matrix of the product, which is definitely going to have a corresponding value here, I think the product of these two matrices is going to have the value that I predicted, and that’s the basic idea of matrix decomposition.
You can see right away that matrix factorization has a problem: It just uses the user’s rating of the item, and we actually know the user’s attributes, like whether the user is male or female, whether the user is old or young, and we also know some attributes of the item, like this item, if it’s a movie, we know who the director is, who the actors are, what his style is, But we can’t provide some information in matrix decomposition, and because the data isn’t available, we can’t use it to improve our system.
So somebody came up with another algorithm called factorization machine.
The idea is to turn the user’s preferences into a formula like the one I define below. What does it mean?
For example, I have three users: Tom, Jack and Alice. We have movies, books and music, and how much they like each other. I code this user as a reader, with only one user in each row. Then the item is coded for readers, and each line also has only one item as 1. The user’s attributes, such as age and gender, are put here, and the item’s attributes are put here. Then for each row, these X data represent the user and the item, as well as the age and gender of the user and the attributes of the item. How much does this user like this item? That’s the value of Y. In other words, we want these values of X to produce a value of Y by a formula like this.
One thing that’s very different from matrix factorization is that X’s are products, they’re dependent. For example, Tom’s X value should be multiplied by Tom’s age and gender; Tom’s value also has to be multiplied by the property of the item, gender or age has to be multiplied by the item, and cross-multiplied by each other to get our variable.
The values we know are Y and X, and the values we don’t know are W and V. As long as we know the values of W and V, we can calculate the unknown Y’s, because X is known, and that’s how a factorization machine works.
In contrast, this approach takes advantage of both the liking of the item and the attributes of the item and the user.
Of course, one thing we must not ignore now is the recommendation system of deep learning.
The simplest recommendation model I can think of using deep learning looks like this: input user ids, say 100 users, coded from 0 to 99; If there are one thousand items, item code from 0 to 999, as the depth study of the network input, and then add a layer embedded, then embedded in the output layer of two vectors together, join a full connection layer, and regularization, plus a full connection layer, add regularization, plus two full connection layer, then use softmax as predicted.
It predicts an output of 000, or it assumes a score from 0 to 5, and the output is five elements with values of 0 and 1. If I predict a score of 1, that’s 1 for the first element and 0 for the other elements, and if I predict the number of points, that’s 1 for the fifth element and 0 for the other elements.
This is one of the most simple deep learning model, it is the only data is input user ID, item ID and the user of this item rating value, then the virtual network will tell you, if you put a new user ID and a item ID tell it not to play too much, it can predict the value of a score.
Let me give you a practical example, which is a little more complicated, but very similar to what I showed you earlier.
This is Google Play, the equivalent of Apple’s APP Store, for downloading apps, probably less so in China. It uses such a deep learning method to make recommendations: input the user’s gender, age, number of apps installed by the user, and interaction with the system are directly sent to its embedded layer; Then there’s the user’s device, Samsung, Huawei, or whatever; And what users to install the APP, and users’ ratings of APP is what, all of these add an extra layer of an embedded, on these properties directly together, deep learning network with three layers, after the user directly to the installed APP, and users to the installed APP scoring two multiplication of the product, As its final input. Train a neural network of such a structure to be used in the Google Play APP recommendations.
There’s also a recommendation algorithm, because you’ve spent a lot of time looking at search engines, which are essentially recommendation systems.
For example, enter the word “Huang Xiaoming” into the search engine, no matter Baidu or Google, there will be a lot of feedback pages. The first page is the article or web page about Huang Xiaoming that the recommendation system thinks you will like the most, and the second page is the article that the system thinks users like the second time.
In essence, this kind of recommendation is the process of presenting a bunch of items to a particular user in order. It is not the specific items that the user likes that are important, but the items that the user comparatively prefers.
If you collect user data and user rating data, you can use the traditional search engine way to score each item and then sort it, which is a very traditional but very useful way.
Another way, one that you can think of without any data, is exploration and exploitation. Here’s an example: Suppose you have five users, and they’re one of a kind, that is, very similar users, and you take user 1 and user 2 at random, and you put them in the lab, and you show them two movies, and you see how they react to the movies.
Recommend one movie to the first user and another movie to the second user. We find that the first user did not click on the movie and did not watch the movie, but the second user did watch the movie, indicating that the movie is more suitable for the taste of the second user group. Then I know that the movie that I recommended to the second user was good, so I can recommend to other users the movie that the second user clicked on.
Why is that? Simple, because these five users are similar, I have taken user 1 and user 2 as the guinea pig experiment, and the experiment shows that this movie is good, so I should recommend this movie to these other users.
But one more thing to note: You saw that we also suggested a movie that didn’t get clicked on one by one. Why? If we only recommend movies to all the users and they don’t click on them, what if the user keeps clicking on the wrong movie, and he likes that movie but he doesn’t click on it, or he’s busy right now, why can’t he watch it? Wouldn’t that throw away the opportunity for a good movie to be seen by users? We still want to show it to some users, but only when it’s displayed, with a low probability that it’s selected because it’s clearly been liked by users.
Finally, we have to mention the integrated learning method.
We have a lot of different recommendation algorithms. If I add all these recommendation algorithms together, will it be better for the user? The reality is better.
How do you combine the outputs of these different algorithms? The first is voting. Let’s say there are three recommendation systems, and two of them say a user likes an item at 5 and only one thinks it’s 4, then I’m going to trust both of them, and the one with the most votes is going to say it. Or take an average, but I think the first algorithm is better, which aggregates the output of a view and gives it more weight.
The second approach to integrated learning is stacking. Suppose there are two recommendation algorithms, and I train the second algorithm by taking the output of the first algorithm as input or part of the input of the second algorithm.
The third method is promotion. I take the deviation between the output value of a recommendation algorithm and the real value I want as my training data for reasoning and training, which is also an integrated learning method.
How do you evaluate a referral system
And then I want to talk to you about how to evaluate recommendation systems.
When it comes to evaluation methods of recommendation systems, offline evaluation immediately comes to mind. According to historical data, a user or don’t like something, I put the history data is divided into two parts, part as the training data, part of test data, I use the value of training data to predict the inside of the test data set value, if the recommended value test data with me in real value differs little, will think I am right, this is a very common one way, It’s called offline evaluation.
The second is questionnaire, which is to place a button on the website or page and directly tell the user: I recommend this movie to you, do you think it is good or not? The user gives the comment directly. Or you can do a survey and ask users if they want to do a survey. When designing the questionnaire, we should pay attention to the following: Two forms of the same question, for example, to ask users whether they like Goodbye Mr. Loser, you can first use one sentence to ask whether Goodbye Mr. Loser is your favorite movie; The second question is: Do you particularly hate Goodbye Mr. Loser, because the combination of the two will make the user think again and prevent him from clicking on it wrong sometimes? And what do you mean by saying what you don’t mean? As a matter of fact, people have emotions and reason, as well as some subconscious. When he tells you in the questionnaire that he likes something, it may not be true, which is also worth paying attention to.
The third way to evaluate recommendation systems is user learning. Ask a bunch of users, do a small test, he tests not only the recommendation system is good, but also the user interface that the recommendation system finally presents is good. This is a test for the entire user experience, only dozens of users in a small range of tests, may find about 90% of the system’s major problems, this is a very good way to evaluate the recommendation system.
And then there’s what’s become very popular these days called A/B testing, or online testing. A new model comes out, a new algorithm comes out, randomly select some users, for example, 10% of the users, the results of the new algorithm, and the other 90% of the users are the results of the original algorithm, and then compare whether the 10% of the users of the new algorithm finally get better recommendation effect, such as buying more products. Or more actively visit our website.
Recommended System architecture & summary
Next, I want to summarize.
There is a theory called “No free lunch theory”. That is to say, no algorithm in the world is better than another algorithm in solving any problem. Even the most popular deep learning algorithm does not mean that it will be better than the traditional learning algorithm in solving any problem. So it’s important to combine multiple different algorithms, which is the integrated learning approach I mentioned earlier.
Netflix is an American movie website, which is equivalent to iQiyi, and its recommendation system is very famous. They held a competition before, with a prize of one million dollars, asking people to compete to see who can improve their original recommendation algorithm by 10%. Finally, a team from AT&T won the competition. They used dozens of algorithms and combined them to create a recommendation system, which improved the performance by 10.09%. So that’s what you have to pay attention to, combining algorithms.
Finally, WHAT I want to talk about is architecture, and I’m going to take Netflix as the main architecture to say how to choose a software system if you want to make a recommendation system.
Divide your system into offline parts, near real time parts, and online parts.
The offline part can use Hadoop, Hive, Pig, or Spark to do large scale calculations, which take a long time. The most recent systems, they use distributed databases like Cassandra, MySQL, and Catch. There is also a real-time system, which is to calculate the system in memory. The system requirements in this part are: the amount of data is relatively small, the algorithm is relatively simple, and the feedback is generally in milliseconds. In this real-time system, only sorting is generally done, while in the offline part, more complex things may be done.
That this is all I want to do today, because today time is limited, I can only select until, if you are interested in recommender systems, in front of me talked about the recommendation system really wanted to have a better understanding, from algorithm to achieve, are interested, you can sweep greed the qr code of science and technology, we got into our public, we will submit more content in it.
Weixin.qq.com/r/eCqPl5XE6… (Qr code automatic recognition)
Question and answer session
Q1. How does deep learning user ID add embedding?
A: The essence of Embeding is A fixed-length encoding of the characteristics of the input. The value of the user ID can range from 0 to 10000, which is an integer, and through the Embeding layer, the output becomes a vector of preset length. The purpose of this is to get richer information through training data, and this vector can contain the information corresponding to the user ID.
Q2. Explore and utilize the difference between collaborative filtering and collaborative filtering
A: Exploring and using this algorithm is more suitable for A small number of items, usually used in the last stage of the recommendation system. After using other algorithms to figure out which items are most likely to be accepted by a certain category of users, use exploration and exploitation to reorder items. Collaborative filtering is less computational and more suitable for large-scale data.
Q3: Can PPT be shared?
A: Please follow our official account and we will share with you.
Q4. Could you talk more about stacking and Boosting in integrated learning?
Boosting A: Boosting: The output of the previous algorithm as the input to the next algorithm, or A part of the input. Boosting: The difference between the output of one algorithm and the actual value is used as the input to this algorithm or other algorithms
Q5: What are the most widely used online sorting algorithms now?
A: I’m not sure I understand your specific question. I assume your question is the most popular algorithm in “Learning to Rank” and my answer is: LambdaMART
Q6: Besides MLlib’s ALS algorithm, is there any other way to use Spark as a recommendation system?
A: Spark officially provides only the ALS algorithm for collaborative filtering. Other algorithms need to be implemented themselves.
Q7: Please explain the input layer of deep learning in detail.
A: This question is too broad. It needs to be analyzed on A case-by-case basis.
Q8: What are the application environments of Item2Item collaborative filtering and matrix decomposition respectively?
A: The collaborative filtering of Item2item is A kind of Neighborhood Method. Compared with matrix decomposition, it is easier to implement, fine-tune and have better interpretability.
Q9: Are recommendation algorithms usually used together with user portraits?
A: If you can get user profiles, of course you should. A high quality user profile will definitely improve the recommendation.
Q10: Teacher, I would like to ask about LFM. In the process of multiplying after matrix decomposition and making difference with the target matrix, how are the values not passed in the target matrix supplemented?
A: There is no need to fill in the values that are not in the target, and the process of doing the difference only needs to consider the values that exist.
Q11: Embedding simple with one hot?
A: Theoretically, you can use full connection layer and one hot encoding instead of Embedding. However, in many cases, using the ONE HOT code directly as the input can increase the computational load. For example, in natural language processing, the input ID value may be in the million level, if the direct one hot coding, the input vector is in the million level of length, if you also need to use batching (batch processing) to improve the transport efficiency, then the content occupation and calculation are huge.
Q12: Is there any application of recommendation system in industrial (electric power, chemical, manufacturing) field?
A: I haven’t found any applications for recommendation systems in power, chemical or manufacturing. I think in the field of industry 4.0, advanced manufacturing and personal customization, recommendation system should be promising.
Q13: provide Actually the whole of this course just said a lot of algorithms, from collaborative filtering to deep learning, such as an engineer in the face of a recommendation system such a question, is there a guideline about, for this kind of problem, we need to adopt the way of a collaborative filtering and then to another no problem may adopt the way of deep learning. Is there a guideline like this?
A: My suggestion is to start with content-based recommendation systems, content-based recommendation systems and collaborative filtering recommendation systems. Why? Even if you use depth, even if you start on a more fancy algorithms, you also don’t know if your algorithm is better than the traditional algorithm, if you are using a rotating algorithm to the first, then you use more advanced algorithm, that, at least let you say this algorithm has a relatively good, otherwise, you are in the clouds and fogs, you use the more advanced algorithm, You don’t know if advanced algorithms are really good, for one thing;
The other is that the more advanced algorithms are more difficult to implement. For example, WHEN I talked about Netflix spending one million dollars to get AT&T Bell LABS ‘winning algorithm, they didn’t actually use it in their system. Why? Because is too complex, an algorithm can be very complicated in academia, in the industry, want to consider to memory, considering the amount of calculation, considering that a programmer can understand the algorithm, considering your code isn’t easy to maintain, that, after considering the Netflix finally the algorithm is not used, That said, if you’re going to start, I suggest you start with the previous algorithm, using content-based recommendations and collaborative filtering recommendations first, followed by more advanced algorithms.
Q14: Another question: Since deep learning technology has been very mature in the field of image recognition, I would like to know the application of deep learning in the recommendation system in the industrial industry.
A: Now that deep learning is very popular, is there A trend to make recommendation system based on deep learning? Yes, there is. Why is there? For example some picture sharing website, or a video sharing sites like YouTube, we send the above things, and no one will tell you what is the content of the inside, upload clips, the system does not know who is this film the author, what style, what are the actors in the inside, what all don’t know, how do you know the relevance here?
The same goes for pictures. How do you know what to recommend? Deep learning, like CNN convolutional neural network, can do the recognition and extraction of material in images and videos, and it can tell you that this image is Obama, this image is Trump or someone else. Including the music and he’ll tell you what the style of the music is.
Therefore, there is indeed a trend towards deep learning, and LSTM and RNN can obtain time information, that is, time correlation. For example, if a person likes to read the Legend of the Zhen Huan, he also likes to watch The House of Cards, but he prefers to watch the House of Cards, then the traditional recommendation system, I am reading the 21 episodes of the Legend of the Zhen Huan, what do I recommend after watching? House of Cards, why? Because it prefers House of Cards, but on that premise, LET me tell you: I’m watching The 21st episode of The Legend of Zhen Huan. Should you recommend House of Cards? You should recommend me the 22nd episode of the Legend of The Zhen Huan, because I am reading the 21st episode of the Legend of the Zhen Huan. This kind of prediction in time, in time series, is obviously better with this KIND of LSTM, recursive neural network.
introduction
Yuan Yuan, English name: Jerry, is a senior engineer at Microsoft headquarters in the United States, leading the research and development of several core recommendation systems, and is an expert in artificial intelligence, distributed systems and cloud computing. He has 14 years of experience in artificial intelligence, recommendation systems, natural language processing, digital image and video processing projects. He studied under Academician Wang Shoujue of Chinese Academy of Sciences in face recognition research and co-published papers. During his PhD in the United States, he mainly studied the space weather prediction project supported by NASA based on artificial intelligence.
For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.