For the World. This is not a complete and complete translation, but rather a summary and notes. If you find any infringement, please contact me.
As of January 6, 2016, Netflix simultaneously launched in 130 new countries around the world, more than 190 countries. Netflix’s need to be ready to scale quickly, while ensuring that each algorithm works seamlessly, presents new challenges for their recommendation and search teams. In this article, they highlight four of the most interesting challenges they encountered in making their algorithms work on a global scale, and how solving those challenges improved their recommendation capabilities.
Challenge 1: Uneven Video Availability
As we all know, copyright licenses are different in different countries and regions of the world, which can lead to some abnormal signals. For example, movie A might be available on Netflix in the United States and movie B is only available in France. However, the normal recommendation model largely depends on the learning mode of play data, especially involving co-occurrence or play sequence between videos. In particular, many algorithms assume that when something isn’t playing, it’s a (weak) signal, expressing that people might not like it. However, in this particular case, the recommendation system will never observe A user playing both A and B. A basic recommendation model would know that the two movies simply don’t appeal to the same kind of people because their audiences are limited to different ones. However, if A and B can play at the same time, similarities between the videos and the users watching the videos may be observed. As you can see from this example, different video availability can interfere with the quality of the recommendation system.
There are similar problems with the search experience. Not taking usability differences into account degrades the quality of search rankings. For example, the highest result of A given query in A normal ranking algorithm might rank video A ahead of video B, but only A few people can play video A, while everyone can play video B.
Another aspect of content licenses is that they have start and end dates, which means that similar issues arise not only between countries, but also within specific countries. A niche video that is available for a short time may be more appealing than a long-running, well-known video, but the latter may have more engagement.
It is conceivable that these issues might have an impact on more complex search or recommendation models because they already introduce bias into something as simple as popularity. The algorithm can provide better recommendations for new services to solve the problem of uneven availability in geographical location and time. They incorporate information about user access to different directories based on geography and time into each algorithm, dealing with missing data and bias.
Challenge 2: Cultural Awareness
Cultural differences are important for recommendations, but not always. Bollywood, for example, should be more popular in India, while Argentinean users might prefer Argentine films. But if two users are fans of sci-fi movies and have similar profiles except for regions, they should get similar recommendations, too. An easy way to capture local preference is to create a different model for each country. However, some countries are so small that they can only obtain a small amount of user data. Training recommendation algorithms on such sparse data will lead to noisy results, because it is difficult for the model to identify clear personalized patterns from the data.
Before Netflix’s global expansion, their approach was to group countries into reasonably sized regions with relatively consistent directories and languages, and then build separate models for each region. This can capture taste differences between regions because the hyperparameters of different models are adjusted in different ways. The recommendation model should be able to identify and use patterns of preference within a region, as long as enough members have certain taste preferences and reasonable historical data. However, there are some problems with this approach. First of all, within a region, the amount of data from a large country will dominate the model and weaken the model’s ability to understand local taste in a country with a small number of users. Keeping groups together is also a challenge, as content catalogs change over time and the number of users increases. Finally, because they were constantly running A/B tests of model variables in many algorithms, the combinations involving more and more regions became overwhelming.
To address these challenges, they sought ways to merge regional models into a single global model that would provide a better recommendation experience in countries with fewer users. Of course, even with the combination of data, the recommendation system still needs to identify taste differences in different regions. According to the data seen so far, both local taste and personal taste will affect users’ choice. But in general, if a user likes sci-fi movies, someone on the other side of the world who also likes Sci-Fi will be more valuable than their next-door neighbor who likes food documentaries. Being able to find a global community of interest means that recommendation systems can be improved further, as they will be based on more data. Using the global algorithm is also helpful in identifying new or different taste patterns that emerge over time.
To optimize the model, they can use a number of signals about content and members. In this global context, two important signals may be language and location. They wanted their model to understand not only where the user was logged in, but also aspects of the video, such as where the video came from, the language it was in, and where it was popular.
Challenge 3: Language
Netflix has grown to support 21 languages, and their content library contains more local content than ever before. This increase leads to a number of challenges, especially for the just-in-time search algorithm mentioned above. The key goal of the algorithm is to help each member find something to play while searching, while minimizing the number of interactions. This differs from standard ranking metrics used to evaluate information retrieval systems, which do not consider the volume of interactions. When looking at interactions, it is clear that different languages involve very different interaction patterns. For example, Korean is usually typed using Korean alphabets, where syllables consist of single characters. For example, to search “올 드 보 이” (Oldboy), in the worst case, members must input nine characters: “ㅇ ㅗ ㄹ ㄷ ㅡ ㅂ ㅗ ㅇ ㅣ”. Using the basic index of the video title, in the best case, members still need to type three characters: “ㅇㅗㄹ”, which will fold in the first syllable of that title: “올”. In a specific index written for Korean, members need write only one character: “ㅇ”. Optimising the best results using the fewest interaction sets and automatically adapting to newly introduced languages with significantly different writing systems is an area they are working on improving.
Another language-related challenge involves recommendations. As mentioned above, while the Taste model spreads globally, ultimately people are most likely to enjoy content in a language they understand. For example, there might be a great French science fiction movie, but if it doesn’t have English subtitles or audio, it shouldn’t be recommended to members who like science fiction movies but only speak English. But if the user speaks both English and French, it’s probably an appropriate recommendation. People also like to watch content originally produced in their native language or other common language. While we’re constantly trying to add subtitles and voice acting in new languages to our content, we don’t have all languages for all content yet. In addition, users in different cultures have different preferences for using subtitles or dubbing to watch. Putting these together, it can be seen that recommendations can be enhanced by an awareness of language preferences. However, it is difficult to clearly define which languages users understand, so we need to infer it from secondary data and viewing patterns.
Challenge 4: Tracking Quality
Netflix’s goal is to build recommendation algorithms that work equally well for all users, regardless of where they live or what language they speak. The challenge they now face is how to determine when an algorithm is suboptimal for some subset of users.
To solve this problem, manually slice a set of dimensions (country, language, directory……) To see the performance of the algorithm. However, some of these slices result in very sparse and noisy data. They can also look at globally observed indicators, but this greatly limits their ability to detect problems. Another approach is to learn how best to group observations so that outliers and anomalies can be detected automatically. As they work to improve recommendation algorithms, they are innovating metrics, devices and monitoring to improve their ability to detect new problems through them.