• Spotify’s Discover Weekly: How Machine Learning Finds Your New Music
  • Sophia Ciocca
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: lsvih
  • Proofreader: PPP-man

Spotify Weekly Picks: Machine learning-based music picks

Every Monday, more than 100 million Spotify users receive a fresh playlist of songs. This custom list contains 30 songs that users have never heard but might like. This magical feature is called Discover Weekly.

I’m a big fan of Spotify, especially for its weekly recommendations. Because it makes me feel like I’m being taken seriously. He knows my musical tastes better than anyone, and every week his recommendations are just right for me. Without it, I might never have found some of my favorite songs in my life.

If you’re having trouble finding the music you want to listen to, let me introduce my best virtual partner:

A Spotify Discover Weekly Playlist — Specifically, mine.

As it turns out, I’m not the only user obsessed with weekly recommendations — many users are crazy enough to make Spotify rethink its priorities and devote more resources to playlist recommendation algorithms.

The Weekly Recommendations feature debuted in 2015, and ever since then, I’ve been eager to see how it works (I’m a fan of their company, so I often assume I work at Spotify and research their products). After three weeks of frantic searching, I was able to peek behind the curtain.

So how does Spotify pull off its amazing job of selecting 30 songs per user per week? Let’s take a look at how some of the other music services do music recommendations and see why Spotify does it better.


As early as 2000, Songza began curating (curating and presenting) music online using human editors. “Manual editing” means that some team of “music experts” or other administrators are required to manually place songs they think are good on the playlist. (Beats Music later followed suit.) While human editing worked well, it was manual and too simple to take into account the personal tastes of each listener.

Like Songza, Pandora is one of the founding fathers of music curation. It uses a more advanced method, using manual annotation of song attributes. In other words, a group of people listened to the songs, selected descriptive words for each song, and labeled each song. Pandora can then use the code to simply filter through the annotations to produce a more similar playlist.

Meanwhile, MIT’s Media Lab has developed a smart music assistant called The Echo Nest, which is pioneering a more advanced way to personalize music recommendations. The Echo Nest uses algorithms to analyze The content of individual music audio and text, allowing it to identify music, personalize recommendations, create playlists and analyze it.

In addition, last.fm, which still exists today, uses a different approach called collaborative filtering. It can identify music that users are likely to like. More on it later.


This is how other music curation services recommend music. So how does Spotify build its amazing engine and make recommendations that are more palatable to users?

Spotify’s 3 recommendation models

Spotify didn’t actually use a revolutionary recommendation model — instead, they created their own unique and powerful discovery engine by mixing the best strategies used in a single service.

Spotify’s weekly recommendations are based on three main types of recommendations:

  1. The collaborative filtering model (the one last.fm originally used) works by analyzing what you do and what others do.
  2. Natural Language processing (NLP) models for analyzing text.
  3. Audio model, used to analyze the original audio track.
Image credit: Chris Johnson, Spotify

Let’s take a closer look at each of these recommended models.


Recommended model #1: Collaborative filtering

First, a little background: When people hear the term “collaborative recommendation,” most people think of Netflix as the company that first adopted the collaborative filtering recommendation model. They use users’ ratings of movies to determine which movies to recommend to other users with similar preferences.

When Netflix successfully used this recommendation method, it took off. It is now generally considered the granddaddy of attempts to use the recommendation model.

Unlike Netflix, Spotify does not ask users to rate music stars. The data they use is implicit feedback-specifically, they count streams of songs that users listen to, and they collect other streams of data, including whether users save songs to their playlists and visit the artist’s home page after listening to them.

So what is collaborative filtering and how does it work? Here’s a quick sketch with this short conversation:

Image by Erik Bernhardsson

What’s going on here? Both people in the picture have some favorite songs – the person on the left likes songs P, Q, R and S; The person on the right likes songs Q, R, S and T.

Collaborative filtering is like using this data to say,

“Emmmmm, you all like Q, R and S, so you are probably similar users. So, you might like something your partner likes that you haven’t heard yet. “

In other words, the person on the right would be advised to listen to song P, and the person on the left would be advised to listen to song T. That’s easy!

But how does Spotify actually put this approach to use to calculate recommendations from millions of users based on their favorite songs?

… Apply the mathematical matrix and then use the Python library to implement it.

In reality, the matrix you’re looking at is enormous, with each row representing Spotify’s 140 million users (if you use Spotify, you’ll be one row in the matrix) and each column representing the 30 million songs in the Spotify database.

The Python library then takes a long, slow time to evaluate the matrix according to the following separation formula:

And when it’s done, we’re going to have two vectors, which we’ll call X and Y here. X is the user vector, representing the taste of a single user; Y is the song vector, which represents the properties of a song.

The user/song matrix produces two vectors: the user vector and the song vector.

Now, we have 140 million user vectors and 30 million song vectors. The content of these vectors is essentially a bunch of numbers that don’t mean anything by themselves. But comparing them can make a huge difference.

To find out which users have the most similar tastes to me, collaborative filtering compares my vectors with each other’s vectors to find the users who are most similar to me. Similarly, compare the Y vector to find the song closest to the one you’re listening to.

Collaborative filtering works pretty well, but Spotify didn’t stop there, and they knew they could make it better by adding some other engines. Let’s look at NLP.


Recommended Model #2: Natural Language Processing (NLP)

The second recommendation model used by Spotify is the natural Language processing (NLP) model. As the name suggests, the model’s data sources are words in the traditional sense — metadata from songs, news articles, blogs, and other texts on the Internet.

NLP, the ability for computers to understand human language, is a huge field. This can be done with some emotion analysis API.

The mechanisms behind NLP are beyond the scope of this article. But let’s put it this way: Spotify crawlers constantly find music-related blogs and texts, and learn what people think about particular artists and songs — what adjectives and language people typically use to talk about them, and what other artists and songs are mentioned in the same time.

While I don’t know the details of how Spotify handles its data, I do know how the Echo Nest works with them. They encapsulate language processing as “culture vectors” or “high-frequency phrases.” Every artist and song has thousands of high-frequency phrases that change every day. Each phrase has a weight that indicates the importance of the phrase (roughly, the probability that someone will use the phrase to describe the music).

Brian Whitman’s table on “cultural vectors” and “high-frequency phrases” used by the Echo Nest

Then, as with collaborative filtering, the NLP model uses these phrases and weights to build a representation vector for each song so that it can determine whether two songs are similar. Is it cool?


Recommended model #3: The original audio model

Before starting this chapter, you might ask:

We’ve already applied enough data to the first two models, why analyze the audio itself?

First, the introduction of this third model can further improve the accuracy of this amazing recommendation service. But actually, this model serves a second purpose: Unlike the first two models, the original audio model can be used to process new songs.

For example, if your singer friend uploaded his new song to Spotify and he only has 50 listeners, that’s too few to use collaborative filtering. And he hasn’t caught on yet, hasn’t been mentioned anywhere on the Internet, so the NLP model doesn’t work for him. Luckily, the original audio model doesn’t care if it’s a new song or an old one. With this help, your friends’ songs could be added to the weekly playlist along with the popular ones!

What follows is an explanation of “how” to analyze such abstract raw audio.

… Using convolutional Neural Networks (CNN)!

Convolutional neural networks are the technology behind face recognition. In the Spotify scenario, engineers use audio data instead of pixels. Here is an example of a structure in neural network 1:

Image credit: Sander Dieleman

This particular neural network has four convolutional layers, which look like thick wooden boards on the left side of the diagram; It also has three fully connected layers, which look like very narrow boards on the right side of the picture. The input value is a representation of the frequency of the audio frame in the form of a spectrogram in the figure.

After the audio frame passes through these convolutional layers, you can see a “global time pooling” layer next to the last convolutional layer. This pooling layer pools along the entire timeline, efficiently using statistics to find features found in the song’s time series.

After that, the neural network outputs its understanding of a song, including various typical features such as timestamps, tonality, style, rhythm, volume and so on. Below is a 30-second clip of Daft Punk’s “Around the World”.

Image copyright: Tristan Jehan & David DesRoches (The Echo Nest)

Ultimately, these key insights from a song allow Spotify to understand the essential similarities between different songs and infer that a user is likely to like the new song based on their listening history.


The above Outlines the three basic components of the recommendation model. It is the recommendation pipeline composed of these recommendation models that ultimately constitute a powerful weekly recommendation song list function!

, of course, these recommendations model also is closely linked with company larger ecosystem, the ecological system contains huge amounts of data, use a lot of Hadoop cluster of recommendation system practicing scale operation, make these engines in large scale, analysis of endless smoothly in the Internet music related articles, and very large audio files.

I hope the information in this article has satisfied your curiosity (as much as mine has). Now I’m using my personalized weekly recommendations to find my favorite music and learn and appreciate the machine learning behind it. 🎶


* * resources:

  • From Idea to Execution: Spotify’s Discover Weekly (Chris Johnson, ex-Spotify)
  • Collaborative Filtering at Spotify (Erik Bernhardsson, ex-Spotify)
  • Recommending music on Spotify with deep learning (Sander Dieleman)
  • How Music Recommendation Works — and Doesn’t Work (Brian Whitman, Co-founder of The Echo Nest)
  • Ever Wonder How Spotify Discover Weekly Works? Data Science (Galvanize)
  • The magic that makes Spotify’s Discover Weekly playlists so damn good (Quartz)
  • The Echo Nest’s Analyzer Documentation

The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, React, front-end, back-end, product, design and other fields. If you want to see more high-quality translation, please continue to pay attention to the Project, official Weibo, Zhihu column.