Justin Bieber Helped Me Learn About Neural Networks

sfungphd
Oct 11, 2017
4 min read

And so did Neil Young.

Ok not the singers themselves, but their song lyrics did. For my final Metis project, I wanted to learn about neural networks (NN). As a neuroscientist, it was too good an opportunity to pass up. What I wanted to do was generate new lyrics based on an artist's style. So at the highest level, train a neural network (NN) on a corpus of Artist X's songs, plug in a seed phrase, receive new text. Here's what I used to accomplish that.

Kaggle is a great source of clean datasets and I found one that contained over 500,000 songs. Using pandas, I did some data exploration and considered what artist to work with. I wanted to make sure I had at least a few dozen to a few hundred songs.

I started exploring character-level modeling with keras, but didn't spend too much time pursuing this because generating anything with grammatically passable output means putting together combinations of characters (letters and punctuation!) that form words and then having those words go together in a way that makes grammatical sense. I mainly wanted to mention this so I can share this output with you. The seed sentence is "she's a brick and i'm drowning slowly" from Ben Folds:

she's a brick and i'm drowning slowly of coll roghbor!"

"shet's got you, got a 't-lard, threw room

in the carp us

lobber door wrong, for night not you

you wrotens to sett this in a moce

hell ginl we'r are sat

[chormone we gay marry

lited the chen

alway

if thinke

just me someone court benes

i'm not talk

with reastitht

where they garr, nerild

melow it seems as you've door chen you cree and them

'tweem, i call

you don't know where yo

I could hear the piano melody as I read the first line/seed sentence and then...cue the "record scratch" sound...it turned to a spoken word session.

I quickly moved on to word-level modeling, which eliminates the hurdle of word construction. If you are unfamiliar with what word-level modeling is, I hope this helps:

The model is predicting the next word in a sequence given an input sequence of, in this example, 4 words. To simplify matters, consider punctuation to be part of the word it appends.

Next, the model slides over one word and starts with new input containing 4 words. (Side note: a model isn't limited to sliding over one word, can be any number of words. You may read about "sliding windows" of text/pixels and this is merely one example of that concept.)

Continue to slide the window to generate new input and predict new output. Words that are precede input are designated "past".

You guessed the song yet?

Trust me, these looked cooler as slides and appeared animated. I could have continued with Ben Folds, but I wanted two artists with very different styles. My colleagues were very helpful with brainstorming ideas.

I experimented with a few NNs, but did not pursue NNs with multiple layers (deep learning) as I did not have enough data to support such a structure. I'll expand on this shortly.

Mildly technical bits ahead:

The input starts off as a sparse vector. For example, if the word representation of the input is "something like a ziploc , but a lip lock \n", it gets tokenized as [278, 30, 22, 1927, 3, 42, 22, 1267, 1910, 1], each word associated with an index. However keras wants a dense vector which is what the Embedding layer does. My choice of a Bidirectional Long Short Term Memory (LSTM) was based on learning that RNNs (LSTM is a special type of RNN) do well with NLP tasks, that Bidirectional RNN contains connections in both directions meaning the output depends on past ("memory") and future inputs, ultimately suggesting that this might be a good tool to factor in context around a word. The next layer is a Dropout layer which removes a percentage of connections to the Dense layer. The purpose of the dropping connections is to prevent overfitting. And finally, we get output text!

Like so:

something like a ziploc , but a lip lock

want you wrapped around my know that

it ' s all i want is you

( you ' re my one i need somebody

i need you

i need somebody , i - i need somebody

i need somebody i - i need somebody

everyday i bring the sun around

i sweep away the clouds

smile for me ( smile for me )

i was born to be somebody

The output from training and running the Bieber model resulted in its own slack chanel. #humblebrag.

Once I had trained models, I was able to supply my own seed sentence (1)- the caveat being that each word belong to both corpora - plug in a model (2), and receive output (3)! Running a trained model locally takes a significant chunk of time, so if you plan on trying this out, put those cloud services to work!

Generalizing to practical applications, imagine that you need to come up with a speech, conference abstract, or manuscript. Wouldn't it be great if you could type in some keywords or phrases and edit a draft rather than come up with something from scratch? With this output, you can either edit out typos:

Or pick out an interesting topic sentence and run with it:

Taking the second approach, I thought the artist style shone through!

Continuing with the project, it would be nice to model using a larger corpus, on a genre level, and do additional preprocessing with the texts to better handle punctuation, get more sophisticated with generating stanzas and choruses. To check out the work I have done so far, please visit the GitHub repo.

Susan Fung, PhD

Justin Bieber Helped Me Learn About Neural Networks

Comments