audio cassette

On a quest for a perfect Eurovision song with data science

The Eurovision song contest has been running yearly since 1956 and it’s only getting ever more popular. Back in 1956, only 7 countries participated in Eurovision. As of 2019, there are 42 registered participant countries, including Australia, which sneaked in four years ago and seems to not be going anywhere. Not only the participating countries are changing, the rules of Eurovision are ever changing as well: which language you can sing in, who are the judges, who are allowed to perform and how they need to do it. But did the idea of what is a good Eurovision song also change over these years? And can we learn from the data what makes a perfect Eurovision song?

Everything about Eurovision is very well documented. All the recordings, lyrics, and song placement in competition finals are available in internet archives. From 1956 up to the last year, almost 1300 songs from different countries have made it to the finals. And here we’re going to put them under scrutiny!  

The Music

Eurovision used to be all about the music, that is, before it became also about ice-skating (Russia, 2008), nude gymnastics (Romania, 2013) and burning pianos (Austria, 2015). May be it headed that way right from the start, but in Eurovision’s infancy it just wasn’t possible to have all of this swag broadcasted over the radio.

Most of Eurovision songs are pop music, with a few notable exceptions, but in fact the music can be any genre, provided it has vocals. A few restrictions that there are: the songs have to be under 3 minutes and, currently, cannot be accompanied by any live instruments. Eurovision went from only allowing a live orchestra (a ban on recorded backing tracks was in place until 1973), to a ban on live music since 1999. Of course, the ban does not apply to vocals, which have to be performed live.

The minor fall, the major lift?

Music in major keys is associated with happy emotions, while minor keys are being used to express negative emotions – such as sadness and anger. Of course, it’s not quite as simple as just choosing a key. However, it’s a very important step in writing a song. So, what do Eurovision song writers like best?

Usually, a song doesn’t just stay in one key. A good way of approximating how “major” or “minor” the song sounds is extracting the chords from a song and summing up their duration. I am using Chordino to extract the chords. Here, we count as major everything that is built upon a major triad – such as major chords and their inversions, dominant seventh, major seventh chords. The same goes for minor – diminished, minor sevenths, etc. are counted as minor.

The figure shows the averages of songs “minorness” in each year of the competition. There seems to be a trend. Overall, Eurovision favors major keys (minor chord percentage is mostly under 50%). The minor sound was especially unpopular in the 80th, now, however, today the gloom and sadness is back again and kicking better than ever.

Which keys have an upper hand, though? For each song, we calculate which percentile the song ended up in the finals (the closer to 1, the closer to winning the competition). It’s better to represent the place a song gets in percentiles, because there is a different amount of songs each year. Within each year, we calculate the average percentile of major songs and minor songs.

As expected, there is no clear winner, but still we can see some speculative trends. In the past couple decades, the minor songs do better. In the 80th, when there were less minor songs overall, they were least preferred as well. So, the composers do have a nose for a popular sound!

Another interesting statistic is a number of distinct chords per song – if there’re many, that might mean some harmonically interesting things – key modulations, or just faster chord changes than usual. The next graph shows that the songs are gradually getting more simple. In the early years some of the songs were not very strict about the 3 minute policy, which might inflate unique chord count a bit, but the trend is there even if we exclude the earlier years.


The songs got faster and more rhythmically stable over the years. From a median tempo of  92.5 bpm in 1957 we now moved to a median bpm of 106 in 2018. The perception of tempo, or speed in music is not defined just by beats per minute, but also by the number of onsets (e.g., how many different pitches play during one second). It’s a bit difficult to measure the onset rate on the music from the 50th and 60th, the modern algorithms do not cope with that music very well. Eurovision started with slow ballads in an extremely rubato tempo, like “Tant De Peine” by Danièle Dupré and slow waltzes like “Straatdeuntje” by Bobbejaan Schoepen. Sometimes you would have a difficulty even finding a beat in those early songs. Nowadays, you can dance to practically any song. The next graph shows how rhythmic stability of the songs changed over time (this property is extracted using a specifically trained neural network). To sum up, by the time disco music got popular, Eurovision caught up and the songs became very much rhythmically stable, but we might be going away from that kind of music, as the later trend shows.

The lyrics

For most of Eurovision history, it was required that a song is written in one of the national languages of the participating country. This is not the case anymore, though many countries still prefer to sing in their language (e.g., France, Italy). Here we are going to use the translations of the songs to look into the content of Eurovision songs. For starters, look at the word cloud of all the lyrics Eurovision ever produced. The main keywords immediately jump to attention: heart, love and life.

This is not surprising. By my rough estimate, about three quarters of Eurovision songs are about love. About 60% of the songs explicitly mention love in their lyrics (some of them are not about romantic love, or not about love at all), but the number of love songs will grows if we also include the songs that mention kissing, breaking hearts and holding hands. So, the majority of songs are about love, but do people want to sing more about heartbreak or the happy variety of love?

Here we use Google Cloud Natural Language API for sentiment analysis. Even though all the lyrics are translated into English, that’s not just English, it’s Sing-Songlish. This is a no rule territory. Half of a song lyric can just be “diri diri di” or “na na na”, or some similar nonsense. And noone is going to mind it at all. Except the NLP API, who did mind it a lot, and, in fact, sometimes just declared the lyrics to not contain English text. Sometimes it was actually the case – Belgium in 2003 took it as far as writing a song in an imaginary language!

Next figure shows the sentiment analysis results. Overall, Eurovision songs are a happy kind. There’s not much of a trend over the years here, except may be a very small change towards more sad songs. Also, in the last years, there were more sad than happy songs among the winners, some of them non-Eurovision like tragic.

How much can you say in 3 minutes?

Song lyrics are often redundant, repeating the chorus or some other lines over and over again. The prime Eurovision example of redundancy is Turkey’s 1989 song, which contains only 7 unique lines out of 42. Actually, Turkey, together with Australia and UK, produce the most repetitive lyrics of all countries. In contrast, the least compressible songs come from Montenegro, Poland and Italy. Kudos to them!

To estimate, how redundant the lyrics are, we will use Lempel-Ziv compression algorithm. This algorithm works by compressing long repeated sequences. The next figure shows ratio of compressed to original lyric. The smaller the ratio, the better the compression. There’s a clear trend towards more compression, which means the songs become more redundant and repetitive.

This doesn’t say anything about the length of the lyric. As we remember, the songs also got faster. Let’s also count, how many unique words there are in a song, and how many words there are in general. Now we see, that in the 60th they were only able to fit 200 words into a 3 minute song. Nowadays, we push for 300 at the very least! This probably means less time for instrumental bridges. Also, more words, but less meaning.  

Is it fine to be openly patriotic?

Eurovision has traditionally been about uniting different nations. So, it would be natural for a country to take a moment to say something about themselves. How many countries use the opportunity! Turns out, not so many. We use Google’s Natural Language API to find named entities within the lyrics, and then find geographical locations among the entities.  

Geographic locations or national languages are mentioned in 7% of the songs, and only half of those mention the origin country itself, its language or some location in the country (usually, the capital). The rest of the geographical songs are “travel and vacation” kind.

Here’s a geoviz for you to watch and see who is mentioning who. Turn on the music!

The most patriotic country is Portugal, which mentions itself in 14% of the songs. They also love to mention the warmer places they used to sail on business trips to in XVIIth century (Brazil, Mozambique, Angola, Timor). Mentioning warm sunny places is a trend overall. The nordic countries mention Mediterranean countries, while Mediterranean countries go for something even warmer and mention Africa and South America. The general rule is that anything worth mentioning is on average about 10 degrees latitude to the south, which is about a distance you need to travel from Dublin to the French Riviera.

The least patriotic country is UK – out of 61 songs they’ve sent to Eurovision, they only mention London once, and that’s it. And even this mention happened almost by accident, among many other European capitals. But actually, being patriotic and proud cannot hurt a good song. The songs that mention the country or the national language even do slightly better than average on the final scoreboard.

English lyrics

Since the rule on singing in one’s national language has been removed in 1999, most of the Eurovision participants now switched to English, or, sometimes, a mixture of languages, including English. In some cases this switch occurs only in the finals, the song goes through the national selection, and only after getting through is translated to English.

The general belief is that singing in a national language is a risk, because the song won’t be able to reach a wider audience as effectively and other countries won’t vote for it. Is that true, though? Sadly, yes, this is somewhat true. The songs in a national language place slightly worse on the leaderboard (here we are looking at the songs since 1999). The sample is small though, and the difference not that big and not statistically significant.

Is it an honour to open the competition?

The songs in the finals are presented in random order, which is chosen with a draw. So, opening the competition is an honour that every year goes to a different country. Though, not so sure whether it is actually an honour or a misfortune. The next figure shows that it’s better to perform towards the end of the competition to get a better place. We use the data on places and song order since 1981 (before that there were less than 20 countries participating and nothing to fill the later bins).  

It’s difficult to say, what causes this skew in votes. With televoting, it might be that in the first half of the competition people are waiting to decide who to vote for, and by the time they’ve decided, they don’t remember the first songs that well anymore.

Politics in Eurovision

Sadly, we cannot say that Eurovision is just about art. Of course, there is hell of a lot of politics to it as well. This is obvious this year more than ever. Countries traditionally form “support groups”, and vote for each other’s songs. The groups are based on geographical proximity, historical ties and even the size of the diaspora.

In a recent study on voting behavior, many groups like this were confirmed. Greece and Cyprus, Finland and Sweden, Malta and UK traditionally vote for each other’s songs. Based on the support groups described in the article, we sum up the “support circle” of each country. For instance, Azerbaijan receives support from Turkey, Russia and Ukraine. Azerbaijan gets 3 points.

The map shows which countries are the most connected. Sweden happens to be the luckier one, receiving votes from all its northern neighbours. Sweden also happened to win 3 times in the period the data is based on (from 1992).

Who is going to win this year?

And now, let’s try to predict this year’s winner. We will use all the available data for that:

  1. Music (mode, melodiousness, rhythmic stability and complexity, tempo, chord compressibility and other musical features extracted from the audio).
  2. Lyrics (compressibility, sentiment, language)
  3. Political support groups of a country
  4. Song order

Out of these features, political support groups and song order are the most powerful predictive features. However, even joining all this data, we can only cover about 10% of variance in the votes.

We train an XGBoost model on all of our data (the target is the percentile that the song ends up in, so it’s a regression model). As the music on Eurovision has changed a lot, we are going to only take the songs since 1999, which makes it ~500 songs. After training, the model outputs the feature importances: in addition to the things we already, know, the model ranked as important strength of emotional expression in lyrics, rhythm complexity and stability, and a number of unique chords per song.

To predict this year’s winner, we have to remove the song order in the finals (it is not known as of yet, neither are the songs that qualified for the finals). Based on all the model trained on the rest of the data, this year’s winner is going to be … either France, Slovenia or the Netherlands, in that order of likelihood. Slovenia’s song is completely in a national language, and France’s song is at least half French. All the three songs are ballads in minor mode. Dutch song is also in a quite rubato tempo. In fact, one of the most free tempo songs of the competition. Unfortunately to me, none of these songs are my favourite!

The worst song this year, according to a model, was submitted by Belgium. It has a strong rhythm, it’s quite repetitive both in its lyrics and in its music, and seems to be about a very unhealthy relationship.

Eurovision 2019 finals are happening on Saturday, May 18th in Tel Aviv. So, good luck to France, Slovenia and the Netherlands!

Have a project on horizon? Let's talk