40 Years of Thrash: Comparing the lyrical content of Sodom, Megadeth and Sepultura via Wordclouds in Python

A short guide to collecting and cleaning lyrics’ data from Genius.com and to creating Wordclouds for exploratory data analysis and comparison.

Image by Carabo Spain from Pixabay

INTRODUCTION

Have you ever wondered how your favourite artists’ lyrical content differs? Do all of them focus on romance? Maybe one of them has a nihilistic point of view and encourages you to live life to the fullest without worrying too much about the moral ambiguity of the world around you? Perhaps that artist you never truly cared for brings up political notions once too many times?

CHOICE OF ARTISTS

Sodom, Megadeth and Sepultura are thrash metal bands, all of which originated in the early 1980s in Germany, the USA and Brazil, respectively — right when the subgenre of thrash metal was establishing itself in the mainstream. Since they all comment on social and political issues like war, human rights and civil equality and are commercially very successful, I thought comparing their lyrics should be quite interesting, despite the differences in their places of origin.

PROCEDURE

The process can be broadly divided into the following steps:

  1. Data Cleaning and Preparation
  2. Visualization

Now let us discuss each step in detail.

1. Data Collection

Genius.com has a rather simple API available in Python which lets users extract information on artists, their songs (lyrics) and albums. You have to make an account here, fill in the corresponding details and it should provide you with the Access Token. For more details, you can read the documentation. The final step should look like this:

Access Token generation screen

2. Data Cleaning and Preparation

We start by removing ‘\n’ values from the lyrics column using regex as follows:

3. Visualization

Now that we have our data ready as strings inside a dictionary, we are ready to visualize it as WordClouds for our bands of choice. We write a function to make Wordclouds with corresponding masks as the logos of the bands (a mask can be made of any image, see here). The package WordCloud is used for this purpose and makes the task rather simple. Again, this function accepts parameters of the dataset and a list of artists, and iteratively forms Wordclouds of strings in the dictionary lyrics_dict and saves each visualization on disk. Stopwords are also removed using the module STOPWORDS as an attribute of the WordCloud function.

I. Sodom

II. Megadeth

III. Sepultura

I make a mean curry, try (really hard) to solve problems and am a heavy metal enthusiast. Learning to be a data analyst @University of Bonn, Germany.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store