​​


Each major music software has its own charts, most of which are calculated by the number of plays, shares, downloads, etc., so are there certain rules for the hit songs? Is it melodic or touching? Let’s take a look at how Dorien uses data science to predict hits from a different Angle: audio properties.


The body of the


Can algorithms predict hit songs? Let’s explore how to use audio features to build a Hit Song classifier, as described in my article Dance Hit Song Prediction (link at the end).


During my PhD research, I came across a paper by Pachett & ROI (2008) entitled “The Science of Hit Songs Is Not Yet a Science”. I found it interesting, and it led me to explore whether you could predict hits. Research on this topic is very limited, for a more complete literature review, see Dance Hit Song Prediction (2014). We felt that the effectiveness of the model could be optimized by focusing on a specific genre: dance music. This is very intuitive to me, because different music types of hit songs have different characteristics.


The data set


To make a hit song prediction, we first need a hit/non-hit data set. Lists of hit songs are easy to find, but lists of unhit songs are hard to find. Therefore, we decided to classify the high and low ranking songs in the hot list instead. We conducted some experiments to see which segmentation works best, as shown in Table 1, which resulted in three data sets (D1, D2 and D3) :


Table 1 – Hot data sets from Herremans et al. (2014)


The distribution of each dataset class is slightly unbalanced:

Figure 1 – Class distribution of Herremans et al. (2014)


Two sources for the hot songs list are: Billboard (BB) and the Original Charts Company (OCC). The table below shows the number of hits collected. Note that songs stay in the charts for weeks, so the number of unique songs is much smaller:


Table 2 – Hot data sets from Herremans et al. (2014)


Now that we have a list of songs, we also need their audio characteristics. We used The EchoNest Analyzer (Jehan and DesRoches, 2012) to extract some audio features. This nifty API allows us to get a lot of audio features based on artist names and song titles (Echo Nest was acquired by Spotify and is now integrated into the Spotify API). Take a look at what we extracted:


Standard audio features

Includes duration, speed, timestamp, mode (primary (1) or secondary (0)), Key, loudness, danceability (calculated by echo nest, based on beat strength, rhythmic stability, overall rhythm, etc.), energy (calculated by echo nest, based on loudness and segment duration).


New time feature

Since songs change over time, we added many time-aggregated features based on Schindler&Rauber (2012). They include the mean, variance, minimum, maximum, range, and 80th percentile of the ~1s segment. This is for the following features:

  • Timbre: PCA basis vector of audio tone color (13 dimensions). A 13-dimensional vector that captures the tonal color of each section of a song.
  • Beat difference: The time between beats.


Good! Now we have a nice collection of audio features, as well as their top chart positions. Like any good data science project should start, let’s do some data visualization. The first thing we notice is that clicks change over time. What was a hit ten years ago is not necessarily a hit today. This becomes apparent over time as we visualize our features:

Figure 2 – Interactive bubble diagram by Herremans et al. (2014).


Interestingly, we’ve seen dance hits become shorter, louder, and less danceable according to the EchoNest’s “dance” characteristics!

Figure 3 – Herremans et al. ‘s evolution of popular features over time (2014)


For full feature visualization, check out my short article on visualizing hit songs:

http://dorienherremans.com/sites/default/files/dh_visualiation_preprint_0.pdf

http://musiceye.dorienherremans.com/clustering.html


model


We explored two types of models: comprehensible models and black box models. As expected, the latter is more efficient, but the former gives us insight into why a song can become a hit.


Decision tree (C4.5)

To make the decision tree fit the page, I set the trim to high. This makes the tree small and easy to understand, but the AUC on D1 is as low as 0.54. We see that only the temporal signature remains! That means they must be important. In particular, the emphasis (sharpness) reflected by Timbre 3 (the third dimension of the PCA Timbre vector) seems to have an effect on predicting hits.

Figure 4 – Decision tree of Herremans et al. (2014)


Rule-based models

Using RIPPER, you get a rule set that is very similar to a decision tree. Timbre 3 appears again. This time, our AUC is 0.54 on D1.

Table 3 – Herremans et al. ‘s rule set (2014)


Naive Bayes, Logistic Regression, Support Vector Machine (SVM)


For a brief description of these techniques, see Dance Hit Song Prediction


The final result


Before getting into the results, I should emphasize that it is pointless to use general classification “accuracy” here because these classes are unbalanced (see Figure 1). If accuracy is to be used, it should be class-specific. This is a common mistake, but it’s important to remember. So we use ROC and AUC and obfuscation matrix to evaluate the model correctly.


10 fold cross validation

We obtained the best results for dataset 1 (D1) and dataset 2 (D2) without feature selection (we used CfsSubsetEval in genetic search). All characteristics are standardized before training. This result makes sense because D3 has the smallest “split” between hot and not hot. Overall, logistic regression performed best.


Table 4 – AUC results of Herremans et al. (2014), with/without feature selection (FS)


Looking at the ROC curve below, we see that the model outperforms the random predictor (diagonal).

Figure 5 – ROC from Logistic regression of Herremans et al. (2014)


The details of classification accuracy can be seen by looking at the obfuscation matrix, which shows that it is not easy to correctly identify songs that are not popular! However, the popular correct recognition rate is 68%.

Table 5- Confusion matrix from Logistic regression by Herremans et al. (2014)


Timeout test set

We also used a chronological “new” song as a test set, rather than a 10-fold cross-validation. This can lead to further performance improvements:

Table 6- Splitting and AUC of 10x CV from Herremans et al. (2014)


Interestingly, the model was better at predicting new songs. What causes this skew? Maybe it learned to predict how trends change over time? Future research should look at the evolution of musical preferences over time.


conclusion


Focusing solely on audio features, Dance Hit Song Prediction predicts that a Song with an 80% AUC will enter the top 10 Hit songs. Can we do better? Possible! The features we see in this study are limited, so by extending it with both low-level and advanced musical features, greater precision can be achieved. In addition, in a follow-up study, I examined the significant impact of social networks on popular predictions (Herremans&Bergmans, 2017).


A link to the

  • Data science for hit song prediction

https://towardsdatascience.com/data-science-for-hit-song-prediction-32370f0759c1

  • Dance Hit Song Prediction

https://www.tandfonline.com/doi/abs/10.1080/09298215.2014.881888?casa_token=w0GGhjQd194AAAAA%3AV_YGtWJeIR869x4fYUfyyfFrP iCWhb56ddybPADsO9s9D-k8WaTZI4ADKxgILlufl3UbICsVZEWB5Q&journalCode=nnmr20


reference

  • Herremans, D., Martens, D., & Sorensen, K. (2014). Dance Hit song prediction. Journal of NewMusic Research, 43 (3), 291-302. [https://arxiv.org/pdf/1905.08076.pdf]
  • Herremans, D., & Bergmans, T.(2017). Hit song prediction based on early adopter data and audio features. The18th International Society for Music Information Retrieval Conference (ISMIR) — Late Breaking Demo. Shuzou,China [Preprint Link]
  • Herremans D., Lauwers W.. 2017.Visualizing the evolution of alternative hit charts. The 18th InternationalSociety for Music Information Retrieval Conference (ISMIR) — Late Breaking Demo. Shuzou,China [Preprint Link]
  • Jehan T. and DesRoches D. 2012. EchoNestAnalyzer Documentation, URLdeveloper.echonest.com/docs/v4/_static/AnalyzeDocumentation. pdf

  • Pachet, F., & Roy, P. (2008,September). Hit Song Science Is Not Yet a Science. In ISMIR(pp. 355 — 360).
  • Schindler, A., & Rauber, A. (2012,October). Capturing the temporal domain in echonest features for improvedclassification effectiveness. In InternationalWorkshop on Adaptive Multimedia Retrieval (pp. 214 — 227).Springer, Cham.


Dr Dorien Herremans – DorienHerremans.com is an assistant professor at Singapore University of Technology and Design, where he directs the Ai Music and Audio Lab.

This article is from towardsdatascience

This article is reprinted from TalkingData School

Cover art credit: Pexels by Rene Asmussen


Read the original English article here