Char-rnn retro-post

As with my previous post, this post is intended to serve as more permanent storage for my previous posts regarding the topic at hand– in this case, Andrej Karpathy’s infamous Char-RNN.

First introduced to the world in this incredible post that everyone should read, char-RNN is a neural predictive text model that tries to guess the next letter in a sequence given only the letters that came before it. The amazing part is that it works– not only does it learn words, it learns things like grammar, syntax, and punctuation, which is pretty amazing.

But you didn’t come here to learn. Or I hope you didn’t. ‘Cause you won’t. That’s the machine’s job.

Just finished training a neural network to try to generate medical terms. Some highlights:
crassysssitectomy
carssimeter
scarnal
dicclomaniferion
dicclometer
diazoline
sucticuloureticular
suspification
tatwacetemes
tatetion
cutaphyll
And many more!

This was my first network trained on my laptop. I don’t remember any of the parameters of the network, but the files are probably still around somewhere. I may look into that and edit those details in.

Training the same neural network on movie titles. Predicted to take ~60 hours to run, so I think I may sanitize inputs and remove duplicates and try again later, but it’s had some amusing results so far:
American Shorund
Fantasies in Collegiate Pravise
Chill Criler
Star Trek Punctiom: The World’s Moves

And from the latest checkpoint:
Art World Wrestling
Violence Day (special edition)
Deadly Smith: Live at Thirst
Deadly Smith: Dark Brown Home
Iron Mailey: An Mentally Good Day: Legend of the Wolfon
Undergrandpa
Disney Chinese Stars
Swordsmann
History Channel Presents: Murdering Secondary National Pergrimes Systemn

I was really thrilled by the spontaneous appearance of the Deadly Smith series. I think I’m gonna start priming future checkpoints with “Deadly Smith” to expand the franchise.

Best so far: “Wolveringlypektin: Demon Hitler 2″

Oh man, these still make me laugh. I gotta try this data set again. I actually later tried priming the network to get more Deadly Smith movies and got:

Deadly Smith: Male History
Deadly Smith: Sinatra of Darkness
Deadly Smith Secrets: The End of Garfield

I would watch either of those last two, but probably not Male History.

Started training Char-RNN on a list of science textbooks. Some very, very early results [with commentary]
Nucleir Exeyctherics [Almost spelled “Nuclear” right!]
Operation of Nanolecullines and Fromage [Yes, how do you operate cheese?]
Handbook of Terperation [What is Terperation? Why do I so urgently want to know?]
Scientific Nerves in Medicine
Viroverognapter (Feurth edition) [Can’t wait for the Feifth edition!]
Based Biology [That sounds like a bad idea]
Primate Human Behavioral Systems [Primate humans? This research sounds unethical]
The Generation and Managing Management [Who manages the management?]
The Him Works [this one sounds religious]

So, really good results so far! Gonna let this (and the movie-title one) run overnight and see where it gets in that time.

This is about where I started adding commentary to the output. Also where I discovered that having two GPUs meant I could train two networks at once. Kinda miss that a little.

Training Char-rnn on a list of restaurant names in New York City. Still mostly gibberish, but I wanted to share something hilarious it spat out just now:
THE PAIN ROOM II, 9549 Six Avenue, New York NY

Some more gems include: “Starsucks,” “Frothing Room,” and “New & Weird’s.”

Had to stop the experiment due to overfitting. Don’t think there was quite enough data, and it started just spitting out “New York, NY” over and over.

This is where I learned that I should sanitate my data. because the restaurant name database was limited to New York City, and contained the restaurant addresses, of course it could get good results by spitting out “New York, NY” over and over.

But seriously, I want to eat at New & Weird’s.

Continuing the restaurant theme, training the network on menu items. Almost completely gibberish so far (the french is confusing it) but here are some funny outputs:
Fruits: Jambo
Spaget Mixed
Whole Celery with Toast
Botato
Scotched Soup
Salted Butter, Burgundy
Vegetaelettes
Veal Chicken
Eggs, hen
Coffee Pork
Oysterned beans
Almond Fort
Eggprtz
Broiled Buckler
Fried Crest
Supreme of Fresh Filet of Salmon Sauce Sundae
And more!

I think this is my favorite iteration so far.

I really hit on some gold here. You’ll see later that I’m still harping on this theme, with a bit more firepower though.

More results from Robo-Restaurateur [with commentary]:
Vanilla Canape [Makes sense]
Imported Sword (half-dozen) [Is this swordfish? Or just swords?]
Boiled leg of Maitre d’hotel [Long pig! An example of the network’s confusion with french]
Fried black bass, stewed in cream and shrimp [Some cool seafood]
Chicken broth with Jerry [Who is jerry?]
Fresh fruit, Mashed [Can I get my fruit un-mashed?]
Bigger Gooby [Bigger… than what?]
Omlette a la tomate [Hey, some correct french!]
Mallard Chaud [Hot duck?]
Creme bamboo amberian sausages [That sounds pretty interesting, actually!]
Cold long island duck [Some New York cuisine!]
Old-fashioned cheese coffee […what?]
Turtle Extra Special [It’d better be special if I’m gonna eat turtle!]
Fried Tomato Juice [How exactly do you fry that?]

 Hmm… Maybe I should go back to this dataset. This stuff is kinda great.

First results from the neural songwriter (suggested by Thomas Begley) contains some awesome stuff:
I’m Reading Up
La Popalco Por Ti Pena
Feelings
Here is a Leadred Around my Porche
Some Hills
Revenge
Last Fast Times
Night of Love
Loco Yellow Girl
Silency Voices
Farewell You
The See Your Lover
Burning Interview Is
Down on the Light
Indequenze
Love To Through The Sun
Music (Explicit Remix)
Cool! Don’t Love Me (Feat. Sas Woo)
Get Me Then
It’s Not Right
When I Love It (Remaster)
Soul Change
It Is The Garabelps
Taperings (Explicit)

All-in-all, I think this is my favorite so far. It helps that I have almost 20 MB of data.

If I remember right this was one of the ones that was really, really hilarious during the early stages. I’ll have to see if I still have those checkpoints around. This was also where I learned that lots of data => better results.

Let’s compare some RNN movie titles and novel titles:

Movies:
Murder Hammy Scooby
Knife Pointer
Murderbacking
Counterfeit Kill Me Rocks
Night of Shade
Hell’s Vampire (1960)
Black Kids 3
Dawn Punch
The Dark Box
Nobody Lived
Nothing Planet
20002: The Man Who Bounty: the Movie
Star Musketeers
Desire Hunter
Dead Daughter

And so on. The theme here is pretty clear. I appear to have created Angst Bot

Novels:
Human Essentials
Purple Takes a Source
Revenge of Sex
The Last Experiences
The Book of Dawn
The Coming Jewellery
Worsted: A novel
All the people of the American Sky
Unlock Cat
The Mercy of the Gods
Against the Slave: A Tribute to Harm
Eye for Detail: Messy stories of Science
Poultry Lady
The Healing Paws
Texas Encyclopedia of Every Pollities
The Rise of Sarah Doom

And many others. I’m thinking I might start uploading massive dumps from the neural networks for people to peruse through, and possibly of starting a blog just for my neural network experimentation so I don’t have to keep pestering people on facebook about it.

Hey, would you look at that! I finally made that blog!
Also these networks are still some of the funnies things I’ve ever created. So pleased with these results.
At this point I started trying to create the SCP-RNN, somewhat unsuccessfully. You can see some of my results here. This is when I discovered that size of database is not the same as quantity of data points, because while there is plenty of text in the SCP object files, there are less than 3000 discrete entries.
This is also about the point I discovered the grid-lstm model, which uses some black magic to make training deeper networks possible. I was hoping for a highway network char-rnn or a residual block char-rnn, but those don’t seem to be around. Maybe I’ll try to whip that up myself once I figure out how the heck they work.
I also created a google drive folder where I sometimes put dumps from my networks for people who want large, unedited blocks of output to peruse and find favorites.

Just for kicks, I’m training a neural network on a dictionary, and I have it hallucinating made-up words. Some favorites:

Comprehense (v.) To proclaim; to burden; to interpret from fire
Deforation (n.) The meaning of a member attached to the exploration or extent of the aggregate.
Iridylis (n.) a liger in embryo
Holerwheel (n.) A stone at one end of a very long groove, used as grain, water, blood, etc.
Depravate (v.) To use under, or dissect, as a shelter; to become shipping to.
Piladin (n.) One who, or that which, compasses
Concurrential (a.) Pertaining to a contract or debate
Doltoid (a.) Like a string

It also likes making up new definitions for real words, such as:
Employment (n.) That which is exempt or exempted; Relating to the hand of evil
or:
Lactate (v. t.) To cause to stop by cooling

Ahh, the dictionary bot, hallucinating definitions for real and made up words.

New favorite word from my dictionary-rnn: “Liveless (a) without forbidden meaning.

Also: Personality (n) the quality or state of being powerful.

There’s a dump from this network in the dumps folder if you want to see some raw output.

So because I was bored last night I started training a neural network on the first data I could find, which ended up being restaurant health inspection reports, so without further ado, inspection bot says:
Do not store plumber.
Keep open bags of foods in cold running water
Hold Demon in server and ham to excess of odor
Provide roaches on all floors under cooking equipment to walk on
Remove soda guns from warewashing area handsinks
Provide for storage of all employees.
Some staff observed stored in the hand sink in the front service area.

Hehehe. This one was great. The data was pretty noisy, though, so I didn’t get any other good results out of it.

I’m training a network (6×260 2D grid-lstm) on academic journals, which don’t have quite the level of jargon I’m looking for, but has still produced some amusing results:
Studio Psychoantility
The Urologa
Water Letter
Carpascate
Civil Rights Industry
American Studies
Kathorcendological review
Biochemical Psychiatry
Microwaves, Literature
Commies! Plant Journal
Architek
Book of the Landscape
Translogollation
Biological Economics

When I created this one, I was trying to find a dataset of research paper titles, and soon enough I did find something similar:

So I’ve found what I was looking for (kind of): a dataset of ~400 MB of research grand awards. The dataset is huge, so it takes forever to do one epoch (one pass through all the data), but I have two checkpoints so far.
The first one just produces thousands of blank lines (oops, didn’t sanitize data for \n\n\n\n\n\n) but the second is producing some (somewhat nonsensical) results. It’s mostly gibberish (for example, a study titled “Induced Induced In Provide Subjectage Orighom Technology State Eke College Slovity”) but it did produce something sponsored by the “University of U,” which I found amusing.

Also amusing are the names it produces:
Zamk S. Schic
Woshopkha G. Hoomal
Michen H. Lahe (this guy’s email is apparently “mlahe@st.bust.edu” and sponsored by “University of the Way”. Who is St. Bust?)

I’m hoping to see some better results soon, and probably will, given the huge amount of data. It would be really annoying to have to start over if I picked bad hyperparameters, though, given how long it’s taking. Really looking forward to getting myself a 1080 so I don’t have to work off a pair of laptop 755’s.

There’s a dump from this network if you’re interested. At this point I switched to training with just the titles:

I’ve given the research grant network about 3 days more training now, so here are some favorites:
A Structural Analysis of Cell Angles
The Consequences of Smooth Surfaces in Culture
Florida Authority in Multi-Element Methods
An Experimental Approach to Study of the Velocity Program
Integrated Development of Self-Engineered Geochronology for Exploration of Nanogrants
Visual Quantum Battery of Ridge Properties
Evolution and Recent Advances in Recharged Division technology
laser lattice systems for real time spectroscopic systems
studies of the effects of molecular physics on the structure of an arctic substrate control
the role of a carbon isotope interaction of the earth’s core superconductor
a national investigation of the evolution of molecules
virtual reality analysis with the international
development of a satellite laser system for sixth new internet

and finally, this doozy:
some technologies for the future

There is also a dump for this network.

At this point, I got myself a GTX 1080 and things got interesting. My first dataset? Recipies.

Some excerpts from the recipie-rnn:
EDGY APPLE SALAD
1/2 lb Fresh bacon
1/3 c White wine
2 ts Lard or garlic — chopped
1/4 c Cold water
3 ts Crushed chives
1 c Tin

that’s right– the edgy apple salad contains a cup of tin

a recipe called simply “CHESTNUTS” which contains such ingredients as “dank tarragon,” “chorizo Chocolate Milk,” and “1 lb star knife — uncooked if possible”

this gem:
In a large soup bowl combine the orange juice, bacon, rosemary, cornstarch, chopped chives and Tarragon. Boil 1 or 10 minutes or until tender. Remove from heat and sprinkle with turkey.

And finally, a recipe called “CAKE FIRMAMENTS” that repeatedly instructed to “boil until cooled,” to “peel the water,” to “dissolve flour,” and concluded with “top with whiskey.”

Really want to see how good I can get this one. It’s not *quite* coherent yet.

The rest is history.

Here’s the first three dumps from my CHEF-RNN (read, three chapters of my cookbook.)

Chapter 1: Low temperature (0.50)

Definitely the most coherent, but unfortunately this also leads to some predictability. I hope you like salt and pepper, because every recipie contains them, sometimes multiple times. Typically, this is the best place to look for actual instructions, but the ingredient lists tend to be a bit uninteresting.

Chapter 2: Medium Temperature (0.75)

The best balance of the three, I think. It doesn’t have the predictability of the low-temperature sample, but isn’t as chaotic as the high temperature one. Unfortunately, this makes it a bit less silly than either of the others, but when it shines, it really shines.

Chapter 3: High Temperature (1.00)

Only to be attempted by expert chefs who want to use ingredients such as “GRANDMAS” “Calories (crushed)” and “Powdered Ice Cream.” These tend to be the most outlandish, which can be hilarious, but also can dissolve into gibberish sometimes.

One thing I noticed across the three dumps is that there’s a pretty clear distinction in style between lowercase text and UPPERCASE TEXT– this is because a lowercase character (‘a’ for example) is just as distinct from a lowercase ‘w’ as it is from an uppercase ‘A’ thanks to the one-hot character encoding. (for those who don’t know, this means that a for example would be represented as the vector {1,0,0,0,0,…} wheras ‘A’ or ‘w’ would be represented as something more like {…,0,0,0,0,1,0,0,0,…} so that ‘a’ is just as different from ‘b’ as it is from ‘w’). I think I’m going to try doing a few passes of training on the same dataset with all characters lowercased and see if I can get the loss a little bit lower.
Plus, it might help with the occasional ANGRY LOOKING RECIPIE.

At this point, I moved back go the grid-lstm model, on my new computer, to mixed initial results.

Additional recipes for the so inclined. These were created using a grid-lstm network, which, similarly to highway or residual networks, allows for the training of much deeper networks. This one in particular was 800 neurons wide by 8 layers deep, my largest ever. Yet, it didn’t really converge any better than the small network I trained previously… I’ll have to tweak some hyperparemeters. I think some normalization would help, maybe a little less dropout (I had it pretty high to prevent overfitting). Of course, there’s still some gold in here.

If you want to see what cane next for our favorite robot chef other than Chef Watson, check out the very next post.

Advertisement
This entry was posted in char-rnn and tagged , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s