And so did Neil Young.
Ok not the singers themselves, but their song lyrics did. For my final Metis project, I wanted to learn about neural networks (NN). As a neuroscientist, it was too good an opportunity to pass up. What I wanted to do was generate new lyrics based on an artist's style. So at the highest level, train a neural network (NN) on a corpus of Artist X's songs, plug in a seed phrase, receive new text. Here's what I used to accomplish that.
Kaggle is a great source of clean datasets and I found one that contained over 500,000 songs. Using pandas, I did some data exploration and considered what artist to work with. I wanted to make sure I had at least a few dozen to a few hundred songs.
I started exploring character-level modeling with keras, but didn't spend too much time pursuing this because generating anything with grammatically passable output means putting together combinations of characters (letters and punctuation!) that form words and then having those words go together in a way that makes grammatical sense. I mainly wanted to mention this so I can share this output with you. The seed sentence is "she's a brick and i'm drowning slowly" from Ben Folds:
she's a brick and i'm drowning slowly of coll roghbor!"
"shet's got you, got a 't-lard, threw room
in the carp us
lobber door wrong, for night not you
you wrotens to sett this in a moce
hell ginl we'r are sat
[chormone we gay marry
lited the chen
just me someone court benes
i'm not talk
where they garr, nerild
melow it seems as you've door chen you cree and them
'tweem, i call
you don't know where yo
I could hear the piano melody as I read the first line/seed sentence and then...cue the "record scratch" sound...it turned to a spoken word session.
I quickly moved on to word-level modeling, which eliminates the hurdle of word construction. If you are unfamiliar with what word-level modeling is, I hope this helps:
The model is predicting the next word in a sequence given an input sequence of, in this example, 4 words. To simplify matters, consider punctuation to be part of the word it appends.
Next, the model slides over one word and starts with new input containing 4 words. (Side note: a model isn't limited to sliding over one word, can be any number of words. You may read about "sliding windows" of text/pixels and this is merely one example of that concept.)
Continue to slide the window to generate new input and predict new output. Words that are precede input are designated "past".
You guessed the song yet?
Trust me, these looked cooler as slides and appeared animated. I could have continued with Ben Folds, but I wanted two artists with very different styles. My colleagues were very helpful with brainstorming ideas.
I experimented with a few NNs, but did not pursue NNs with multiple layers (deep learning) as I did not have enough data to support such a structure. I'll expand on this shortly.
Mildly technical bits ahead:
The input starts off as a sparse vector. For example, if the word representation of the input is "something like a ziploc , but a lip lock \n", it gets tokenized as [278, 30, 22, 1927, 3, 42, 22, 1267, 1910, 1], each word associated with an index. However keras wants a dense vector which is what the Embedding layer does. My choice of a Bidirectional Long Short Term Memory (LSTM) was based on learning that RNNs (LSTM is a special type of RNN) do well with NLP tasks, that Bidirectional RNN contains connections in both directions meaning the output depends on past ("memory") and future inputs, ultimately suggesting that this might be a good tool to factor in context around a word. The next layer is a Dropout layer which removes a percentage of connections to the Dense layer. The purpose of the dropping connections is to prevent overfitting. And finally, we get output text!
something like a ziploc , but a lip lock
want you wrapped around my know that
it ' s all i want is you
( you ' re my one i need somebody
i need you
i need somebody , i - i need somebody
i need somebody i - i need somebody
everyday i bring the sun around
i sweep away the clouds
smile for me ( smile for me )
i was born to be somebody
The output from training and running the Bieber model resulted in its own slack chanel. #humblebrag.
Once I had trained models, I was able to supply my own seed sentence (1)- the caveat being that each word belong to both corpora - plug in a model (2), and receive output (3)! Running a trained model locally takes a significant chunk of time, so if you plan on trying this out, put those cloud services to work!
Generalizing to practical applications, imagine that you need to come up with a speech, conference abstract, or manuscript. Wouldn't it be great if you could type in some keywords or phrases and edit a draft rather than come up with something from scratch? With this output, you can either edit out typos:
Or pick out an interesting topic sentence and run with it:
Taking the second approach, I thought the artist style shone through!
Continuing with the project, it would be nice to model using a larger corpus, on a genre level, and do additional preprocessing with the texts to better handle punctuation, get more sophisticated with generating stanzas and choruses. To check out the work I have done so far, please visit the GitHub repo.