I’m interested in the ‘shoegaze’ subgenre of indie rock. This less-known genre, also called ‘the scene that celebrates itself’, rose from the noise pop scene in the late 80s and early 90s, and has regained popularity recently. An article titled ‘The Shoegaze Revival Hit Its Stride in 2023’(Sherburne 2023) by pitchfork also captures the revival of shoegaze among GenZ populations.
Shoegaze is characterized by heavy use of overdriven guitar and various effect pedals with the combination of the ethereal vocals; the lyrics are often regarded as blank, poetic, and sometimes difficult to distinguish from the instruments. So here comes this project for the indieheads - I want to analyze those usually overlooked lyrics, especially from the perspective of sentiment.
The bands I want to study are the so-called classic ‘big-three’ of shoegaze: my bloody valentine, Ride, and slowdive. They were all on the independent record label Creation from London, UK, reached their peak in early 90s, disbanded in late 90s due to the fading of the subgenre, and reunited in the 21st century, which make them a perfect fit to study the changes in the lyrics over the years.
My Questions
What words frequently appear in their lyrics?
Which bands have the longest and shortest lyrics?
Who is the saddest shoegaze band?
Are bands trending towards happiness or sadness over time?
Project
Data Acquisition & Wrangling
Data Acquisition
Lyrics from all studio albums of the ‘big three’ bands in the shoegaze genre - my bloody valentine (mbv), Ride, and slowdive - are retrieved from Genius.com using a Python package named lyricsgenius(Miller 2024) based on Genius API. Please see the author’s instructions for details.
The downloaded .json files were written into a .csv file for further processing. Please go to the source repository for more details.
21 ContributorsSoft as Snow (But Warm Inside) Lyrics Soft as snow but warm inside Penetrate, you cannot hide Feeling lost forever, really need you Feeling dark and feeling true This is all I ever knew Soft as skin in leather and I whisper, “You” Harder you come down on me (Ooh, ooh) Sink away, you look happy (Ooh, ooh) Secrets keep forever, they’re undressing me (Ooh, ooh) Come inside, it’s warm in here (Ooh, ooh) Better now to have no fear (Ooh, ooh) Carried on a wave, where it can lead you? (Ooh, ooh) Touch your head, then your hair Softer, softer everywhere Fingertips are burning, can I touch you there? Soft as velvet, eyes can’t see Bring me close to ecstasy High away to heaven, and I’m coming too Float now, coming down on me (Ooh, ooh) Handed you what I cannot see (Ooh, ooh) Feel the big happy, you’re exploding me (Ooh, ooh) Soft as snow and warm inside (Ooh ooh) Penetrate then re-divide (Ooh, ooh) Slip away forever, really need you (Ooh, ooh) You might also like Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh1Embed
Isn’t Anything
Data Dictionary
Field Name
Data Type
Description
Track.Number
Integer
The track number of the song in the album
Song.Title
String
The title of the song
Artist
String
The artist performing the song
Release.Date
String
The date the song was released
Lyrics
String
The lyrics of the song
Album.Name
String
The name of the album the song belongs to
Data Wrangling
The lyrics directly retrieved from Genius.com usually have some problems -
‘16 ContributorsBallad of Sister Sue Lyrics’ at the beginning and ‘6Embed’ at the end are not part of the lyrics.
‘√¢¬Ä¬ô’ - some problem with Unicode encoding/decoding.
‘See Slowdive LiveGet tickets as low as $55’ - ads is also not part of the lyrics.
So, it requires some data cleaning.
# clean textlyrics_clean <- lyrics %>%mutate(Lyrics =stri_enc_toutf8(Lyrics)) %>%mutate(Lyrics =str_replace_all(Lyrics, 'â\u0080\u0099', "'")) %>%mutate(Lyrics = Lyrics %>%str_remove(".*Lyrics") %>%str_remove("See.*tickets as low as \\$\\d+") %>%str_remove('You might also like') %>%str_remove('\\d*\\s*Embed$') %>%str_trim() ) %>%rename(Album = Album.Name) %>%filter(!str_detect(Lyrics, "^\\s*$")) # filter out instrumental pieces# add some factors for further processinglyrics_clean <- lyrics_clean %>%mutate(lyric.length =nchar(Lyrics),Release.Date =dmy(Release.Date),Release.Year =year(Release.Date))print(lyrics_clean[1,'Lyrics'])
[1] "Soft as snow but warm inside Penetrate, you cannot hide Feeling lost forever, really need you Feeling dark and feeling true This is all I ever knew Soft as skin in leather and I whisper, \"You\" Harder you come down on me (Ooh, ooh) Sink away, you look happy (Ooh, ooh) Secrets keep forever, they're undressing me (Ooh, ooh) Come inside, it's warm in here (Ooh, ooh) Better now to have no fear (Ooh, ooh) Carried on a wave, where it can lead you? (Ooh, ooh) Touch your head, then your hair Softer, softer everywhere Fingertips are burning, can I touch you there? Soft as velvet, eyes can't see Bring me close to ecstasy High away to heaven, and I'm coming too Float now, coming down on me (Ooh, ooh) Handed you what I cannot see (Ooh, ooh) Feel the big happy, you're exploding me (Ooh, ooh) Soft as snow and warm inside (Ooh ooh) Penetrate then re-divide (Ooh, ooh) Slip away forever, really need you (Ooh, ooh) Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh Ooh, ooh"
# this is for ride-only analysislyrics_ride <- lyrics_clean %>%filter(Artist =='Ride')# this keeps the dataset working for the original analysislyrics_clean <- lyrics_clean %>%filter(!str_detect(Album, "\\[EP\\]")) %>%mutate(Album =factor(Album, levels =c("Isn’t Anything", "loveless", "m b v", "Nowhere", "Going Blank Again", "Carnival of Light", "Tarantula", "Weather Diaries", "This Is Not a Safe Place", "Interplay", "Just for a Day", "Souvlaki", "Pygmalion", "Slowdive", "everything is alive")))
Data Analysis
What words frequently appear in their lyrics?
library('tidytext')# tokenize by wordtoken <- lyrics_clean %>%unnest_tokens(output = Word,input = Lyrics,token ='words')# stopwords droppedtoken_clean <- token %>%anti_join(stop_words, by =c("Word"="word"))token_count_cloud <- token_clean %>%group_by(Artist) %>%count(Word, name ='Word_count', sort =TRUE)token_count_head <- token_count_cloud %>%group_by(Artist) %>%slice(1:30) %>%mutate(Word =fct_reorder(Word, Word_count, .desc =FALSE))# make a word cloudlibrary(ggplot2)theme_set(theme_bw())library(ggwordcloud)palette <-c("my bloody valentine"="#d83c7a","Ride"="#4b8ab8","Slowdive"="#ae8f32","when you sleep"="#f576a8","Vapour Trail"="#56a8e3","When the Sun Hits"="#d1b971" )ggplot(token_count_head, aes(label = Word,size = Word_count,color = Artist)) +geom_text_wordcloud(word.ratio =0.2, # adjust for overall word sizemax_size =30) +facet_wrap(~ Artist) +scale_color_manual(values = palette) +labs(title ='Most Frequently Used Words in Lyrics',caption ='Word size based on frequency. Only the top 30 results are shown here.' )
The result is kind of amusing. Ride and mbv rely their vocals heavily on harmonies and hummings - and it’s reflected honestly here, while slowdive has the most ‘meaningful’ lyrics among the three. ‘Love’ is the ultimate meaning of rock and roll, and it indeed appears in high frequency in all of their lyrics.
Which bands have the longest and shortest lyrics?
# plottingggplot(lyrics_clean, aes(x = Album, y = lyric.length, fill = Artist)) +geom_boxplot(color ='black') +scale_fill_manual(values = palette) +labs(title ='Which bands have the Longest/Shortest Lyrics?',caption ='Boxplot showing the range and median of lyric lengths for each Album.', color ='Before Reunion?', y ='Lyric Length (character)' ) +theme(axis.text.x =element_text(angle =45, hjust =1))
Among the three bands, Mbv always has the shortest lyrics overall while Ride has the longest. An interesting trend is that while Ride and slowdive have similar lyric length back in the 90s, after reunion, Ride tends to have longer lyrics - and the length increases for each album.
Who is the saddest shoegaze band?
I used afinn from package tidytext for sentiment analysis based on this paper(Koto and Adriani 2015).
# data processing: sentiment 'afinn'token_count <- token_clean %>%group_by(Artist, Album, Release.Year) %>%count(Word, name ='Word_count', sort =TRUE) %>%ungroup()token_afinn <- token_count %>%inner_join(get_sentiments('afinn'), by =c("Word"="word"))afinn_score_by_artist <- token_afinn %>%group_by(Artist) %>%summarize(avg =round(mean(value), 2)) %>%ungroup()kable(afinn_score_by_artist)
Artist
avg
Ride
-0.06
Slowdive
0.00
my bloody valentine
-0.11
Among the three bands, mbv has the lowest sentiment score, which means it is the saddest shoegaze band (if we only look at the lyrics). All three bands got sentiment scores less than or equal to zero, with slowdive holding the highest score (0.00).
What about their greatest hits?
Using data from last.fm, we find out that the greatest hits of the three bands are:
when you sleep - my bloody valentine (32171 weekly listeners),
Vapour Trail - Ride (1943 weekly listeners), and
When the Sun Hits - Slowdive (53055 weekly listeners).
I’m interested that whether those songs have a happier or sadder vibe compared to their other songs -
# by songtoken_count_bysong <- token_clean %>%group_by(Artist, Song.Title) %>%count(Word, name ='Word_count', sort =TRUE) %>%ungroup()# give afinn scoretoken_afinn_bysong <- token_count_bysong %>%inner_join(get_sentiments('afinn'), by =c("Word"="word"))# clean data set, drop songs with only 1/2 rowstoken_afinn_bysong <- token_afinn_bysong %>%group_by(Song.Title) %>%filter(n() >2) %>%ungroup()# calculate average scoresafinn_score_all <- token_afinn_bysong %>%group_by(Song.Title, Artist) %>%summarize(avg_song =mean(value))# look at the greatest hitsafinn_score_gh <- afinn_score_all %>%filter(Song.Title =='Vapour Trail'| Song.Title =='When the Sun Hits'| Song.Title =='when you sleep')# combine datasets for plottingafinn_score_gh <-inner_join(afinn_score_gh, afinn_score_by_artist, by =c("Artist"="Artist"))# plotggplot(afinn_score_gh, aes(x = Artist)) +geom_point(shape =16, size =3, aes(y = afinn_score_gh$avg, color = Artist)) +geom_point(shape =17, size =3, aes(y = afinn_score_gh$avg_song, color =Song.Title)) +scale_color_manual(values = palette) +geom_hline(yintercept =0, linetype ='dashed', color ='black') +labs(title ='Who is the Saddest Shoegaze Band?',caption ='Average sentiment score calculated based on afinn scores.\n<0: sad/negative, >0: happy/positive.', y ='Sentiment Score', x ='Time', color ='Artist average/\ngreatest hits' )
Mbv has the lowest overall sentiment score. All those songs have a slightly higher sentiment score than their artist’s average scores. People seem to prefer happy songs!
With the table above, we find that Ride’s saddest songs are ‘I Came to See the Wreck’ and ‘Only Now’, and happiest songs is ‘The Dawn Patrol’; Slowdive’s saddest song is ‘The Sadman’, and happiest songs is ‘Everyone Knows’; mbv’s saddest song is ‘if i am’, and happiest song is ‘only shallow’. The Sadman is really sad.
Are bands trending towards happiness or sadness over time?
afinn_score <- token_afinn %>%group_by(Artist, Album, Release.Year) %>%summarize(avg =mean(value))library(ggrepel)ggplot(afinn_score,aes(x = Release.Year, y = avg, color = Artist, label = Album)) +geom_point() +geom_line() +geom_text_repel(size =3) +geom_hline(yintercept =0, linetype ='dashed', color ='black') +scale_color_manual(values = palette) +scale_y_continuous(limits =c(-1,1)) +labs(title ='Sentiment Trend of Lyrics Over Time',caption ='Average sentiment score calculated based on afinn scores.\n<0: sad/negative, >0: happy/positive.',color ='Artist',y ='Sentiment Score', x ='Time' )
When examining the trend over time, it is noteworthy that at the beginning of their careers, the bands all had very sad lyrics; in the middle of their careers, their lyrics became more positive. It is also interesting to observe that the latest albums of mbv, Ride, and Slowdive are all among the saddest of their entire careers.
Bonus: Here Comes the RIDE Fan!
RIDE blew my mind when I attended their Nowhere concert with the Charlatans back in Jan 2024 in the amazing venue Union Transfer, Philadelphia. Since then I’ve become a huge fan… So I couldn’t help but did more data analysis on them (!).
Disclaimer: I’m by no means overlooking the members who do not write the lyrics. I love you Steve. It’s just because I can only do text analysis at this time.
My Questions
Who writes the lyrics in each album?
What’s the most frequently used words for each lyricist before/after reunion?
Top 5 Words for Each Album
Who is the Saddest Lyricist?
Are they trending towards happiness or sadness over time?
Data Wrangling
Information about lyric writers were from interviews and record sleeves.
# assign lyricistandy_bell_songs <-c("Drive Blind", "Close My Eyes", "Like A Daydream", "Silver", "Dreams Burn Down", "Here And Now", "Seagull", "Kaleidoscope", "In a Different Place", "Polar Bear", "Paralysed", "Vapour Trail", "Sennen", "Beneath", "Today", "Not Fazed", "Chrome Waves", "Time of Her Time", "Cool Your Boots", "Making Judy Smile", "Going Blank Again", "Howard Hughes", "Birdman", "Crown of Creation", "Endless Road", "Magical Spring", "I Don’t Know Where It Comes From", "Sunshine/Nowhere To Run", "Dead Man", "Walk on Water", "Mary Anne", "Castle On The Hill", "Gonna Be Alright", "The Dawn Patrol", "Ride The Wind", "Burnin’", "Starlight Motel", "Charm Assault", "Home Is A Feeling", "Weather Diaries", "Lateral Alice", "Cali", "Impermanence","Cold Water People", "Catch You Dreaming", "Future Love", "Repetition", "Kill Switch", "Clouds of Saint Marie", "Fifteen Minutes", "Jump Jet", "Dial Up", "End Game", "In This Room", "Peace Sign", "Last Frontier", "Light in a Quiet Room", "Stay Free", "Last Night I Went Somewhere to Dream", "Midnight Rider", "Portland Rocks", "Yesterday Is Just a Song")mark_gardener_songs <-c("Chelsea Girl", "All I Can See", "Furthest Sense", "Perfect Time", "Taste", "Decay", "Unfamiliar", "Leave Them All Behind", "Twisterella", "Mouse Trap", "Time Machine", "OX4", "Stampede", "Moonlight Medicine", "1000 Miles", "From Time To Time", "Only Now", "Deep Inside My Pocket", "Lannoy Point", "White Sands", "Pulsar", "Keep It Surreal", "Shadows Behind the Sun", "Monaco", "I Came to See the Wreck", "Sunrise Chaser", "Essaouira")loz_colbert_songs <-c("Nowhere", "Natural Grace", "Rocket Silver Symphony", "R.I.D.E.")collab_songs <-c("All I Want", "Eternal Recurrence")cover_songs <-c("How Does It Feel to Feel?")lyrics_ride <- lyrics_ride %>%mutate(lyricist =case_when( Song.Title %in% andy_bell_songs ~"Andy.Bell", Song.Title %in% mark_gardener_songs ~"Mark.Gardener", Song.Title %in% loz_colbert_songs ~"Loz.Colbert", Song.Title %in% collab_songs ~"collaboration", Song.Title %in% cover_songs ~"cover",TRUE~NA_character_ ),Album =fct_reorder(Album, Release.Date),is90 = Release.Year <2000)
Data Analysis
# tokenize# tokenize by wordride_token <- lyrics_ride %>%unnest_tokens(output = Word,input = Lyrics,token ='words')# unique by songride_token_unique <- ride_token %>%group_by(Song.Title) %>%distinct(Song.Title, Word, .keep_all =TRUE) %>%ungroup()# stopwords droppedride_token_clean <- ride_token_unique %>%anti_join(stop_words, by =c("Word"="word"))
Who writes the lyrics in each album?
# who writes whatggplot(lyrics_ride, aes(x = Album, fill = lyricist)) +geom_bar() +scale_fill_manual(values = ride_palette) +scale_y_continuous(breaks =1:12, minor_breaks =1:12) +theme(axis.text.x =element_text(angle =45, hjust =1))+labs(title ='Who writes the lyrics in each album?',fill ='Lyricist',y ='Count', x ='Album' )
Andy Bell did a lot, especially for Nowhere, Tarantula, and This is Not a Safe Place.
What’s the most frequently used words for each lyricist before/after reunion?
# count - lyricist, is90ride_token_count <- ride_token_clean %>%group_by(lyricist, is90) %>%count(Word, name ='Word_count', sort =TRUE) %>%ungroup()# find top 15ride_token_count_head_bylyricist <- ride_token_count %>%group_by(lyricist, is90) %>%filter(lyricist =='Andy.Bell'| lyricist =='Mark.Gardener') %>%slice(1:15) %>%mutate(Word =fct_reorder(Word, Word_count, .desc =FALSE)) %>%ungroup()# most frequent wordsggplot(ride_token_count_head_bylyricist, aes(x = Word_count, y = Word, fill = is90)) +geom_col() +facet_wrap( ~ lyricist) +scale_fill_manual(values = ride_palette, name ='Status', labels =c('After Reunion', 'Before Reunion')) +scale_x_continuous(breaks =c(0, 5, 10, 15, 20, 25), minor_breaks =1:25) +labs(title ="What's the most frequently used words for each lyricist\nbefore/after reunion?",fill ='Before Reunion',caption ='Only top 15 frequently used words were shown.',y ='Word', x ='Count' )
It’s surprising that ‘time’ was used a lot. I’m interested that whether all bands like this word or it’s just RIDE. Maybe it would be my next project in the future.
Top 5 Words for Each Album
# The Word for each album?ride_token_count_album <- ride_token_clean %>%group_by(Album) %>%count(Word, name ='Word_count', sort =TRUE)# find top 5ride_token_count_album_head <- ride_token_count_album %>%group_by(Album) %>%slice(1:5) %>%filter(!Word_count ==1) %>%mutate(Word =fct_reorder(Word, Word_count, .desc =FALSE)) %>%ungroup()# plottingggplot(ride_token_count_album_head, aes(x = Album, y = Word, label = Word)) +geom_text(size =3, aes(color = Word_count)) +scale_color_continuous(high ='#a688b9', low ='#ccd8e0') +coord_fixed(ratio =0.3) +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(title ='Top 5 Words for Each Album',color ='Count',y ='Word', x ='Album' )
Some themes are always there - time, day, life, feel…
Who is the Saddest Lyricist?
# who is more depressedride_token_count_aa <- ride_token_clean %>%group_by(lyricist, Album, Release.Date) %>%count(Word, name ='Word_count', sort =TRUE) %>%ungroup()# apply sentiment valueride_token_afinn_aa <- ride_token_count_aa %>%inner_join(get_sentiments('afinn'), by =c("Word"="word")) %>%filter(lyricist =='Mark.Gardener'| lyricist =='Andy.Bell'| lyricist =='Loz.Colbert')# calculate overall scoreride_afinn_score_by_lyricist <- ride_token_afinn_aa %>%group_by(lyricist) %>%summarize(weighted_avg =round(sum(value * Word_count) /sum(Word_count),2)) %>%ungroup()# plottingggplot(ride_afinn_score_by_lyricist, aes(x = lyricist, y = weighted_avg, fill = lyricist, label = weighted_avg)) +geom_col() +geom_text() +scale_y_continuous(breaks =c(-1, 0, 1), limits =c(-1.5,1.5)) +scale_fill_manual(values = ride_palette) +geom_hline(yintercept =0, linetype ='dashed', color ='black') +labs(title ='Who is the Saddest Lyricist?',y ='Sentiment Score', caption ='Average sentiment score calculated based on afinn scores.\n<0: sad/negative, >0: happy/positive.',x ='Lyricist',fill ='Lyricist' )
Mark Gardener seems really sad.
Are they trending towards happiness or sadness over time?
# Is anyone getting more depressed over time?# assign affin score for each person for each albumride_afinn_score_by_lyricist_album <- ride_token_afinn_aa %>%group_by(lyricist, Album, Release.Date) %>%summarize(weighted_avg =round(sum(value * Word_count) /sum(Word_count),2))# geom_line() must work with numeric factorride_afinn_score_by_lyricist_album$Album_num <-as.numeric(ride_afinn_score_by_lyricist_album$Album)# plottingggplot(ride_afinn_score_by_lyricist_album, aes(x = Album_num, y = weighted_avg, color = lyricist, shape = lyricist, label = Album)) +geom_point(size =3) +geom_line() +scale_color_manual(values = ride_palette) +scale_x_continuous(breaks = ride_afinn_score_by_lyricist_album$Album_num,labels = ride_afinn_score_by_lyricist_album$Album ) +scale_y_continuous(limits =c(-2,2)) +geom_hline(yintercept =0, linetype ='dashed', color ='black') +theme(axis.text.x =element_text(angle =45, hjust =1)) +labs(title ='Sentiment Trend of Lyrics Over Time',caption ='Average sentiment score calculated based on afinn scores.\n<0: sad/negative, >0: happy/positive.',color ='lyricist',y ='Sentiment Score', x ='Album' )
Koto, Fajri, and Mirna Adriani. 2015. “A ComparativeStudy on TwitterSentimentAnalysis: WhichFeatures Are Good?” In Natural LanguageProcessing and InformationSystems, edited by Chris Biemann, Siegfried Handschuh, André Freitas, Farid Meziane, and Elisabeth Métais, 453–57. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-19581-0_46.