Zhang Liang, co-founder of Zhihu, asked Teacher Yu Jun a question on Zhihu: “Based on your experience, what are the three to five product improvements that Zhihu needs to make most urgently?” Yu jun of the teacher’s answer to the first opinion is that “explore and push the personalized content, I know there is a lot of content in zhihu is interested in me, but only rarely zhihu push content is I willing to click on, always make me have seed into the baoshan and back to the feeling of empty, the netease cloud music, taobao, today’s headlines are good learning objects.” So how does Quora make recommendations? The following content is from RecsysChina.

Quora VP Engineering machine learning champion Xavier Amatriain, who gave a presentation at the Question Answering Workshop at WWW2016, Machine Learning for Q&A Sites: The Quora Example[1].

Quora’s Mission: To share and grow the world’s knowledge. Quora mainly considers three factors: Relevance, Quality, and Demand.

Quora’s core data model and its relationships:

One of the core issues of Feed Ranking Quora is Personalized Feed Ranking. Quora links “knowledge” with questions, answers and topics as the core, then classifies content quality based on users’ actions such as pushing and stepping, and finally makes knowledge flow within the community through the follow-up relationship between people and questions. Personal feeds are the main carriers of this “flow”. Xavier says Quora is more difficult to create Feed Ranking than Netflix, and that’s normal. Xavier wouldn’t jump if there wasn’t a bigger challenge. The primary goal of Quora Feed Ranking is to ensure that the content pushed into users’ feeds is highly relevant to their interests. Another consideration is follow-up and interaction between users. Xavier calls this social relevance. For example, some questions and answers related to hot events should be timely pushed into users’ feeds. Objective: Present most interesting stories for a user at a given time

  • Interesting = topical relevance + social relevance + timeliness
  • Stories = questions + answers

3. Xavier confirmed that relevancy ordering significantly increased user engagement compared to time-ordered ordering. 4. Challenges:

  • potentially many candidate stories
  • real-time ranking
  • optimize for relevance

Below is the basic data composition for Quora to do Feed Ranking. Quora calls this “Impression logs”.

Around these basic behaviors, Quora defines the following Relevance functions.

In a nutshell, a “behavior weighting function” is used to predict the user’s interest in a story. There are two alternative methods of calculation: one is to put all the actions into a regression model and predict the final value directly, or the other is to predict the probability of each action separately (such as top, read, share, etc.) and then add them together. The first one is simple, but less interpretable. The second one can make better use of each action signal, but it needs to match each action with a classifier, which consumes a lot of calculation. The three main models Quora uses are as follows:

Xavier also emphasizes the importance of feature engineering. Working on this area can be very helpful in getting a good ranking result, especially if you can update features online in real time, so that you can respond to user behavior in a more timely manner. The main features of Quora include:

  • user (e.g. age, country, recent activity)
  • story (e.g. popularity, trendiness, quality)
  • interactions between the two (e.g. topic or author affinity)

In the grand scheme of things, Quora’s Feed Ranking is nothing special. It’s pretty much standard in the industry. What makes Quora special is that its data model is more complex and has more diverse relationships than other sites. For example, from the perspective of users, we can follow other users, questions and topics.

  1. Follow users receive a wider and more diverse range of information. Surprise content is likely to come from interesting users they Follow, but it may also be the most likely to create irrelevant content noise. The most important work in this area is the evaluation of user professionalism.
  2. Question/Answer is the core content element of Quora and also the force that drives the knowledge flow in The Quora system. The main work of this section is to guide more highly professional users to contribute high-quality answers, and how to stimulate the production of more good questions (even automatic questions). To calculate an Answer ranking, there is also work to be done against SAPM.
  3. Topic is the aggregation of a Topic content, which plays an extremely important role in the information architecture of Quora and is the skeleton of knowledge structure. Quora calls this Topic Network, and how to build a Topic Network is itself a very big challenge. Other issues to be solved include how to find (potential) quality issues under Topic, how to reduce water issues and filter/merge duplicate issues, etc. Each of these core questions has been further explored by Quora. Answer Ranking Goal: Given a question and N answers, come up with the ideal ranking of those n answers. Quora mainly considers the following three dimensions for ranking calculation. Each large dimension contains a number of features.
  4. The quality of the content itself. Quora has clear guidelines [2] on what constitutes a “good answer”, such as one that is fact-based, reusable, provides explanation, well formatted, etc.
  5. Interaction, including top/tap, comment, share, favorites, click, etc.
  6. Some characteristics of the respondent, such as the respondent’s expertise in the question area.

In addition, this part of the work also includes two parts: non-personalized and personalized. The sorting of some kinds of questions is non-personalized, and the best answer is consistent for all users, while other questions are personalized, and the best answer for each person will have their own personalized judgment. In a word, Answer Ranking is very important for Quora, which is very detailed. There is a special article about this on Quora blog. Interested friends can go to see the original [3].

The Answer Ranking system end-to-end

Ask2Answers A2A is one of the most important features in Quora’s product. Quora could have recommended questions directly to the appropriate respondents, which is what Quora did initially, but it didn’t feel as good as asking people to respond automatically. A2A operation enhances the sense of ceremony, making the invitees feel needed and satisfied psychologically. In addition, this is also a social action. One of the essence of social interaction is to make it convenient for users to “pretend to be forced”. It’s a seemingly simple feature, and Quora has worked hard to model A2A as a machine learning problem: Given a question and a viewer rank all other users based on how “well-suited” they are. Likelihood of receiving a request + likelihood of the candidate adding a good answer, Consider both the likelihood that the browsing user will send an invitation and the likelihood that the invitee will be invited to respond. There is also an article on the Quora blog detailing their approach [4].

Topic Network Quora has gone to great lengths to guide users in tagging content properly, and the benefits of continuing to do so are beginning to show. They found that [5]

  1. Topics are rapidly diversifying as their user base expands.
  2. Many fields have self-organized fairly well hierarchical knowledge structures.


Quora believes it is possible to organize domain knowledge by relying on communities.

User Trust/Expertise Inference This is another very important thing for Quora. Quora needs to identify experts in a certain field and guide them to contribute more high-quality answers in this field through products. Quora takes into account how many questions users have answered in a particular area, and how much data they have received from likes, clicks, thanks, shares, favorites, and views. Another important aspect is the spread of expertise. For example, if Xavier has a thumbs-up for an answer in the field of recommendation systems, the author of the answer is likely to have a high degree of expertise in the field of recommendation systems. Other related topics include recommended topics, recommended users, related questions, repetitive questions, anti-spam, etc. Quora uses machine learning to solve these problems extensively. Quora’s greatest treasure is the vast amount of valuable content that has accumulated over the years in various fields, and it’s no wonder Quora has mined it, There is an article called Mapping the Discussion on Quora Over Time through Question Text[6], which is a good case of data mining.

Facebook and several other topics over time

References:

[1] www.slideshare.net/xamat/machi…

[2] www.quora.com/What-does-a…

[3] engineering.quora.com/A-Machine-L…

[4] engineering.quora.com/Ask-To-Answ…

[5] data.quora.com/The-Quora-T…

[6] data.quora.com/Mapping-the…

For more details, or to apply for product trial, welcome to visit the official website of intelligent product recommendation of the fourth paradigm!

This account is the official account of the first recommendation of the fourth paradigm intelligent recommendation product. This account is based on the computer field, especially the cutting-edge research related to artificial intelligence. It aims to share more knowledge related to artificial intelligence with the public and promote the public’s understanding of ARTIFICIAL intelligence from a professional perspective. At the same time, we also hope to provide an open platform for discussion, communication and learning for people related to ARTIFICIAL intelligence, so that everyone can enjoy the value created by artificial intelligence as soon as possible.