Quora has been using machine learning for some time. We always keep up with the latest methods and make significant improvements to existing methods. It is important to note that all of these improvements were first optimized and tested offline using A variety of different offline testing methods, but ultimately passed online A/B testing.

In this post, I’ll talk about some of the most important machine learning applications and techniques that Quora will use in 2015.

(Xavier Amatriain is vice President of engineering at Quora.)

ranking

Ranking is arguably one of the most important applications of machine learning on the web. Companies large and small have built their business models around rankings, for example, the results returned by a query string. Quora uses different ranking algorithms in different contexts and for different purposes.

An interesting example is answer ranking. Given that there are several answers to a question, we are interested in sorting them in descending order so that the “best” answer comes first and the worst answer comes last. (See screenshot below).

Review images

Determining the correct order of answers to a question involves a variety of features. To determine the order, we first need to determine how Quora defines a “good answer.” A good way to come up with this definition is to study the Quora thread What Answers Quora Thinks Are Useful, which includes criteria such as “authentic,” “reusable,” “explained,” and “well-formed.” Our machine learning algorithm implements a special machine learning ranking method that uses multiple features in an attempt to encode multiple dimensions associated with the above abstract concepts. For example, we used features that describe information about the quality of writing, as well as features that describe the interactions that the answer received (such as the number of likes, clicks, and expands). We also used characteristics associated with the answer author, such as his expertise in the question area.

There are many other ranking apps on Quora, some of which are not even noticed. For example, the user names that like an answer are also sorted in order to prioritize the users we think are most knowledgeable about the question/answer. Similarly, when possible respondents to a particular question are displayed, the recommended users are also sorted.

Let’s take a closer look at two special cases of machine learning ranking algorithms: search and personalized ranking.

searchalgorithm

For an app like Quora, search algorithms can be seen as just another application for ranking. In fact, search can be broken down into two steps: word matching and ranking. The first step is to somehow return documents (questions) that match the query string entered in the search box. These documents are then ranked as candidate questions for step 2 to optimize things like click probability.

Many of the features in step 2 can be used, and it is really another example of a machine-learning ranking algorithm. This includes simple text characteristics that have been used in the initial text matching phase, as well as other characteristics related to user behavior, or object attributes such as popularity.

Personalized ranking

In some scenarios like the one described above, a global optimal ranking for all users may suffice. In other words, we can assume that the order of the most “helpful” answers to a given question is independent of the user reading the answers. However, this assumption does not hold true in many important situations. One occasion is the Quora Feed, which is basically a home page visible to anyone who logs into the product. On this homepage, we try to pick and rank the most “interesting” stories for a particular user at a particular time (see example below). This is a typical machine learning personalized ranking, similar to how Netflix’s home page ranks movies and TV shows.

Review images

Quora’s use case is more challenging than Netflix’s ranking of movies and TV shows. In fact, our use case can be viewed as a combination of Netflix, Facebook, and Google News optimized for personalized ranking. On the one hand, we want to make sure that the top stories are thematically relevant to our users. On the other hand, Quora has a clear relationship with its users. What you do on “social networks” should also affect your ranking. Third, Quora stories may sometimes be related to ongoing trends. Timeliness is another factor that should influence model decisions to determine whether a story should be ranked higher or lower.

As such, Quora’s personalized rankings involve a variety of characteristics. Here are a few:

  • Quality of question/answer
  • Topics of interest to the user
  • Other users that the user follows
  • Popular events
  • &

In fact, keep in mind that on Quora we’re not only interested in getting people to read interesting content, we’re also interested in submitting questions to people who can write interesting content. Therefore, we must include features that involve interesting answers as well as features that are specific to the question. To derive these characteristics, we use information derived from user, author, and object (such as answer/question) behavior. These behaviors are taken into account and accumulated over different time Windows to feed the ranking algorithm. In fact, there are many different features available to add to our personalized push model, and we’re always trying to add more.

Another important consideration for our Feed ranking application is that we need to be able to respond to user behavior, perceptions, and even popular events. We have millions of questions and answers and growing, so we can’t try to rank every user in real time. To optimize the experience, we implemented a multi-stage ranking solution where candidates are selected and ranked in advance, and then the final ranking is actually performed.

recommended

The above personalized ranking is already a form of recommendation. Similar methods were used in different cases. For example, the popular Quora email collection includes a list of stories picked and recommended for you. This is a different machine learning ranking model, optimized according to different objective functions. In addition to the ranking algorithm, we have other personalized recommendation algorithms in different parts of the product. For example, in several places you can see recommendations for people or themes (see below).

Review images

Issues related to

Another source of recommendations is to show the user other issues that are in some way related to the current issue.

The relevant questions are determined by another machine learning model that takes into account a variety of different features, such as text similarity, co-visit data, or common features such as topics. Characteristics related to prevalence or problem quality should also be considered. It is important to point out that a good “similar question” recommendation is not just about how similar an item is to the source question, but also about how “interesting” the target question is. In fact, the most troublesome problem with any relevant item machine learning model is the tradeoff between similarity and other relevant factors.

Review images

Related Questions This model is particularly effective for attracting logout users to the question page from an external search. This is one reason why the recommendation model has not been personalized so far.

Repeat the question

The duplication problem is the extreme of the related problem mentioned above. For Quora, this is a challenge because we want to ensure that the user’s effort to answer a particular question is shared and directed to the right place. Also, it’s important to point out the available answers to users who want to ask questions on the site. As a result, we spend a lot of effort on detecting duplicate problems, especially during the issue initiation phase.

Review images

Our existing solution is based on binary classifiers trained with repeated/non-repeated tags. We use a variety of semaphores, ranging from text vector space models to feature-based usage.

User credibility/expertise inference

In an app like Quora, it’s very important to know the credibility of your users. In fact, we are not only interested in answering the question, but also in its relevance to the topic. A user may be knowledgeable about some topics, but not others. Quora uses machine learning to infer a user’s expertise. Not only do we know what answers users write on a given topic, but we also know how many likes, clicks, and comments those answers get. We also know how many “features” this user has received in this area. Endorsements are an explicit recognition of someone’s professionalism from the point of view of other users.

Another important thing to keep in mind is that credibility/expertise is propagated through the network, which also needs to be taken into account by the algorithm. For example, if a machine learning expert gives a thumbs-up to my answer in the field of machine learning, it should carry more weight than a thumbs-up from a random user who is not an expert in the field. The same applies to recommendations and other user-to-user features.

Spam Detection and Moderation

Sites like Quora that pride themselves on keeping their content high quality have to be very wary of fooling the system with spam, malicious or very low quality content. The purely human review pattern cannot be extended. The solution, as you might guess, is to use machine learning models to detect these problems.

Quora has several models for detecting content quality-related issues. The output from these classifiers will not be used directly for decision making in most cases, but rather feed the questions/answers to moderation queues for human review.

Content creation predictions

For Quora, it’s important to remember that we optimize many parts of our system, not just to attract readers, but to produce the best quality, most popular content. So we have a machine learning model that predicts how likely a given user is to write an answer to a given question. This allows our system to prioritize these issues in a variety of ways. One of these is the system’s automated A2A (Ask to Answer), which prompts questions to potential respondents. Other ranking systems mentioned above also use this model to predict probabilities.

model

Quora has tried many different models for the different cases described above. Sometimes we use open source implementations, but more often we end up with more efficient and flexible internal versions. I won’t go into the details of the models, but I will list the models our system uses:

  • Logistic regression
  • Elastic network
  • Gradient enhances decision trees
  • Random forests
  • The neural network
  • LambdaMART
  • Matrix decomposition
  • Vector modeling and other natural language processing techniques
  • &

conclusion

To sum up, Quora uses machine learning in a variety of ways. We have made significant gains with these machine learning methods, and we believe there will be more to come, and we will continue to invest in new technologies. In addition, there are exciting new applications of machine learning in the near future that we already have in mind. These new applications include AD ranking, machine translation, and other areas of natural language processing, all of which are directly new features that we plan to add soon.

1 assist
collection
comments

My personal statement

Three classmates were criticized by the teacher, the teacher said: the first classmate is the first offense can be forgiven, the second classmate is an old hand, repeated indoctrination. Then the third classmate asked: What about me? The teacher angrily say: you ah ~~~, you are already brittle hemp flower.