Jokebot redux

Jokebot has finished training. It still has some serious issues. Perhaps the issues are even more serious.

DISCLAIMER: The “jokes” the network produces do not represent my views or sense of humor. The data was scraped from reddit and it shows. Also the network has become VERY RUDE, so proceed with caution.

Very low temperature:

What do you call a cow with no legs? A: A pilot, you racist.

What do you call a black man who speaks a piece of course? A small medium at large.

What do you call a black guy who likes to be a company lived in a car? A paragraph and a condom.

What do you call a black man that has a porcupine? A pilot, you racist piece of cake.

What do you call a cow that doesn’t work? A condescending con descending.

What do you call a black guy who can’t see? A small medium at large.

It appears that at this temperature the network has over-fitted a bit. Specifically, most of the answers are “A small medium at large,” or “A pilot, you racist piece of [cheese, cake, shit, etc].” I did laugh at “you racist piece of cake” though.

Low temperature:

What do you call a short pilot? A pedophile.

Why do black people have such big noses? Because they can’t even.

What do you call a transformer who can’t fly? A baked potato.

What do you call a black man who doesn’t scream at least year? A little shit.

What do you call a bunch of dead babies? A family seat.

What do you call a dead baby flying a plane? A baked potato.

What do you call a deaf prostitute? A broken nose.

What do you call a scary girl with a shit with her brothers? A cock in your mouth

Dear god it’s only gotten worse. What have I created?

Medium temperature:

Why did the dick go to the man in the mud? He was taking a shit!

Why couldn’t the chicken cross the road? It was two-tired.

What do you call a deaf man with no arms and no legs in the middle of the ocean? A pilot, you racist mother.

What’s the difference between a comedian and a gay guy? A hockey player showers after 3 periods.

What do you call a superhero who is going to be a tree? An asshole.

What do you call a fat person who only eats gay men? A semi-chicken

What did the pedophile say to the pirate? Nothing.


But on the plus side, it also created these:

What do you call a woman with an extra leg? A woman

What did the doctor say when he fell out of the closet? Damn

High temperature:

What did the cremate say to the stove? Whoat. Oh, it was out yet.

What do you call a stoner with a bad real paint in your jean? Half of course!

How do you make a blind man organ? With a snowblower.

What do Jewish people with breasts and dumb games have in common? Everyone wants to smell it, but it’s gonna be dead.

What do you call a cow with a pet dog? A space member

What’s the difference between Michael Jackson and a bag of cocaine? one spits and the other is a group of cunning.

What do you call a gun on a wheelchair? A tooth crip.

What do you call a cow with no eyes? The Nemon Roll.

What do you call a chicken coop with a donkey and a white guy? A crustacean!

What do you call two monkeys floating in the middle of the ocean? The Amazon.

What’s a stormtrooper’s favorite sport? Project and Tour Debate

This is where the network got the most laughs. Some of them are just so absurd. It also had a few of my least favorite “jokes”

Very high temperature:

What do you call a Mexican with one phone in his arse? No PROCEDO

Which have you call a Graveyard nurse? Shroting me Debatins

What has 3 beans? A Brown.

What’s the difference between 8 out of roux and figure?,You can tuna piano, but you can’t jelly until your mom on your ass.

What do you call a confused asian? Spaghetti

How do you cut an elephant into a snowblower? I’ll tell you tomorrow.

What did the buffalo say to the ground? Nothing. He just came back.

What is Bruce Lee’s favourite food???8? URDUMA

How many average people does it take to change a light bulb?None, it’s still dark dirty.

What did the dumb brothel say?I wooden hanger.

Why did captain say the toaster between her boyfriend?Cause the dick waves pings.

What do you call a cow machine? A cow with cheese.

As you can see, it got a bit dadaist, as is wont to occur.

Posted in Uncategorized | Leave a comment


I fixed the problems! It seemed that I needed to update Cutorch, and in order to do that, I had to update CUDA, and in the process I inadvertently uninstalled my graphics drivers. What an adventure. To celebrate, I’ve been training char-rnn on a database of question/answer jokes scraped from reddit. Still early in training, but here are some highlights so far.

DISCLAIMER: The jokes the network produced are not representative of my opinions. The source data came from scraping a subreddit and it shows. The network is… a little offensive sometimes.

Low temperature:

What do you call a man who can’t even live? A star track.

What do you call a group of banana that can’t stand a lot of leaves? An angry banana.

Why do mathematicians have to go to the other to talk to the bar? Because they can go stop the political and store they go to the bar.

What do you call a prostitute with no legs? A pilot of the bar.

What do you call a group of children who goes to a chicken star? A sandwich.

What do you call a chicken that starts a basketball team? A star bang.

What do you call a prostitute that can’t even live? A star to the chimney.

What do you call a black man who has a bar and a redneck in the world?,A person that has a great salad.

For some reason it really likes the answer “A star” and “A pilot” and including prostitutes in the questions.

Raising the temperature a bit, we see:

What do you call a porn star that doesn’t wark? A pencil

What do you call an alligator with no legs? A space mate.

What do you call a prostitute who is going to be a computer? A pot battery

What do you call a group of baby that doesn’t have a bar? Lettuce

Did you hear about the person who got a few bar stars? He had a horse with the shopping story.

There were also a few that started to get a bit lewd, which I guess says something about the data source. Let’s keep cranking up the heat!

What’s the difference between a terrorist and a chickpea? Errrrrrrrrly marks out of a stranded college.

What do you get when you put an elephant in a car? The holly-convention

What do you get when black girls want to pee? 1st light.

Okay, what the hell, jokebot. That got bad fast. And it doesn’t get much better:

What do you call a midget in a prostitute? A cross character.

What do you call an Indian snake fighting his brother? A HAR GUUR NELLAR!

What does a porn star say to a Jewish bank?,Hello Game of a toilet Life.

Did you hear about the cock-worker who was in the statistic on the stool?,He had a man from the weather.


For science, let’s crank the temperature all the way up.

How do you react a hippie? An angry salad.

What is the sound of irony? Osian.

What’s the difference between a Day and a gas bill? Thought in the oven.

Why can’t the chicken take a deud at the main crag countant? At the swseek.

What did the doctor say to the mathematician? Fuck mississippy!

What’s the difference between an alcoholic and a baby? With a portuplage binguins, they’re both tattooed.

Did you know about Pokemon massacre Tunnels? His son makes a tight in the Olympics teaches.

Why doesn’t Usian greet a pothead? He’s always stopped up bunched!

Why won’t Michelle coop continue?,Because a punched people in pedophiles.

This is actually better… just because they make less sense. It clearly has a really twisted sense of humor though. Pokemon massacre Tunnels? WHAT?

This seems to be a pretty clear example of why data is important. I expected most of the jokes to be clean with a few bad ones, but it seems to be the other way around. I’ll keep training to see what happens, mostly because I’m curious.

Posted in char-rnn | Leave a comment


While I’ve been working out some issues I’ve been having with torch, I did some training on a database of all Jeopardy questions. Unfortunately, training was cut short by said torch problems, so I’ll have to resume that tonight. Here’s a sampling of my favorites: (read in Alex Trebek’s voice)

SINGERS,$800,’In 1969 this film classical former president says, I read that branch of the park recorded the top of the memoir’,Born in the Martian Empire

MUSEUMS,$600,’His first major commission treats a color’,a political plant

ROCK PRIZE,$400,’In 1992 this season was a Philip Harrison in 1999 in the Mark of the Dance Age in 1996′,Alice

FAMOUS WOMEN,$400,’The best state capital is a distinguished by this film by Elizabeth II’,Shakespeare

VICE PRESIDENTS,$1200,’An example of this senator displays with 100 miles in Mark the Palace Committee on April 1991′,John Hancock

THE AMERICA,$400,’In 1797 this president enters the southernmost word for the same name’,the Standard Sea

PEOPLE FROM PENNSYLVANIA,$2000,’The famous cathedral of this word meaning to hold it to the model of the Roman Empire’, Parthenon

The answer actually matches the question! Kinda! Too bad the category is “people from Pennsylvania.” As you can see, most of the questions tend to be gibberish. I hope that’ll resolve itself with more training or a larger network, though.

Setting the network’s temperature to the lowest possible results in variations on:

A MOVIE SCIENCE,$1000,’The state state of this country is a state where the state is a state in the state’,the Roman Party

THE SILVER SCREEN,$1000,’The state of this country is a state for the state of the state’,Mark Twain

Mostly lots of “The state” and “Mark Twain.” Also common occurrances: “The first president,” “Mariah Carey,” “Marie Antoinette,” “Charles Martin,” and something called alternatively “The Band of the Road,” and “The Band of the World.”

It did also produce this oddity:

THE SILVER SCREEN,$1000,’This country was a popular president of the state of the Sea of Fame’,The Man With The Brothers

Oooh. The Man With The Brothers is a little spooky. And “the Sea of Fame” sounds cool.

Some of the categories generated at higher temperatures are hilarious, even when the questions start to fall apart. For example:













B__INERS [sic]


And my all-time favorite:


Honestly, that’s probably a real category, (as are some of the others I’m sure) but I don’t care. It genuinely made me laugh.

I’ll train the network for a bit longer tonight and see if results improve.

Posted in Uncategorized | Leave a comment

Machine Learning vs. Human Learning

I’ve been a bit busy to run any experiments, unfortunately, but I’ve still been thinking about this quite a bit. Since I haven’t posted in about a month, I figured I’d share one of my motivations for getting into machine learning: it offers a lot of really interesting parallels to human learning. Below I’ve collected a few examples of techniques I’ve picked up that have some surprising connections, and have even given me a bit more insight into how people work.

Learning Rate Annealing

This is a common technique for training that I’ve been using quite a bit with Wavenet. Essentially, the idea of learning rate annealing is that over the course of the training regimen, whatever system you’re optimizing will learn slower and slower. This is due to the nature of a lot of problems machine learning is used for– often, getting a tiny bit closer to the solution won’t show any measurable improvement until you’re right on top of it, thus it’s better to have a high training rate early on. However, if the training rate is kept this high, it will skate right over the global minimum. Lowering the training rate allows it to narrow in and get more precise as it gets closer to the answer it’s looking for. The training rate has to start high because this allows it to avoid getting stuck in local minima that might distract from another, more optimal solution.

In more plain English, the high training rate at the beginning lets the model quickly get a general sense of how things work, and lowering it as training progresses lets it fill in the details.

What’s fascinating about this is that this mirrors something that happens to humans as we age. As we get older, neuroplasticity drops, and it gets harder and harder to learn new skills, change tastes and opinions, or adapt to new environments. While this might seem like a bad thing, since it will make it harder to learn, it also has the added benefit of proofing the system (be it a machine or a human) against outliers. For example– if the first time you see someone drop a rock, it falls, you might conclude that this happens all the time (and you would be correct). As you see more and more objects dropped, this belief solidifies, until one day, someone drops something, and it doesn’t fall. If your pattern recognition was as elastic as it was the day you saw that first rock dropped, you might conclude sometimes dropped objects float, which is wrong. What’s more likely is that you’ll assume that something fishy is going on and thus this data point won’t skew your internal model of the behavior of dropped objects.

Learning rate annealing is what stops the first thing we see that contradicts our worldview to that point from causing us to throw out all of our assumptions to that point (for better or worse).

Unreliable Parts & Noisy Data

One recurring problem with machine learning models is the tendency to overfit. This essentially means that the model is learning to match patterns that exist in the training data that are not representative of reality. This will create a model that can faithfully reproduce/categorize/recognize the training data, but fails miserably in the wild. There are a lot of ways to avoid this, but one of the most common ones used for neural networks is called dropout.

The idea behind dropout is essentially randomly disabling a certain number of the neurons in a network each training step. Since at any given time, any neuron could be down, the network has to learn redundancy, forcing it to create a more robust representation of the data with overlapping roles for the neurons.

Of course, the only way to completely avoid overfitting is with more data, but this isn’t always possible. In this situation, one technique that gets used a lot is to multiply the amount of data by taking each training sample and distorting them in some way– rotate them, shift them, scale them vertically or horizontally, add random noise, shift colors, etc. For text-based data, this might involve using a thesarus to replace words with synonyms, or deliberately include misspellings, though this is mostly applicable to images.

This artificial noise gives the model a larger range of possible environments to interpret, which will make it better at generalizing (though it’s not quite as good as just having more data) and better at interpreting poorly sanitized data, which is also important for working in the wild. In addition, this random artificial noise prevents the model from overfitting to noise patterns present in the training data because the noise is always changing. Even though each individual sample is distorted, the noise averages out in the end and results in a more robust system.

These two techniques are so powerful that google has actually created a piece of hardware they call the “Tensor Processing Unit,” a parallel processing chip, which has “reduced computational precision, which means it requires fewer transistors per operation,” and means that it’s “an order of magnitude better-optimized” than conventional hardware. They’ve implemented dropout and noisy data by simply removing the precision and reliability that are so important to many other types of computation, just packing together noisy, unreliable circuits, and it actually makes it better.

This also mirrors life. Biology is noisy, unreliable, and messy. The parts don’t always work, and when they do, they’re not very precise, but for this application, it not only doesn’t matter, it actually helps. This is, in large part, why intelligent life was able to evolve at all. Neural networks are the ideal system for creating intelligence in an noisy, unreliable, ever-changing environment.

Internal Vector Encoding

This is one of the things I find the most fascinating about machine learning. One of the most basic types of neural network, called a Restricted Boltzmann Machine, has only two layers (or three, depending on how you interpret it). One layer acts as the input, which passes through a second, smaller layer, and then attempts to recreate the original input. In doing so, the model is attempting to figure out the best way to compress the input and still be able to recover the original data. This results in a compressed vector representation that reflects the structure of the input in a more compact way.

Expanding this simple model with more layers, this can create some really interesting structures. The output doesn’t even have to be in the same form as the input– for example, the input could be English, and the output French, or vice versa. In this case, the internal vector representation stores the meaning of the sentence, independently of the language. What’s really fascinating about this particular example is that it works better the more languages are used. Because the internal representation stays the same, having more languages allows the model to create a better compressed representation using ideas that may not exist in one language to help translate it to another. Translating from English to Chinese is easier if the network can also translate English->French and French->Chinese.

Once again, we see this in humans too. It is much more likely for someone who knows two languages to learn a third than it is for someone who knows only one to learn a second. It could be argued that this is because of cultural differences that change a person’s upbringing and allows them to learn multiple languages while their learning rate is still relatively high, I think there are other factors at work. This is just a personal belief, but I would not be surprised if something similar was at work– knowing multiple languages allows someone to have a more efficient internal representation of ideas and of language in general, as they can integrate aspects of multiple languages into their thought processes. This is the strongest argument I’ve ever been presented with for learning multiple languages.

Posted in Uncategorized | Leave a comment

Encouraging results with Wavenet!

After doing some digging into the code and resolving an error that caused the network to devolve into white noise, then talking with some other  folks about what seemed to work for them, along with a whole bunch of hyper-parameter optimization, I’ve had some encouraging results!

For these, I’ve restricted the training to one speaker. Each successive test represents one round of hyper-parameter optimization, and for the last one, I switched to SGD with momentum as the optimizer instead of ADAM with normalization.

It is also very interesting to note that the most successful test, test 7, was also the smallest of the networks used of these tests, and trained for the shortest time– only 26,000 iterations instead of 50,000, 100,000 and 150,000 for tests 6a,b,&c. My next test will be to continue training on this network with a reduced learning rate to see if I can get it even better, but I’m really happy with these results.

My eventual goal is to get this running with some of my music to see what it spits out.

Posted in Wavenet | Leave a comment


So I got wavenet working.

Well, for a limited definition of “working.” I’ve been running some experiments to try to figure out how the model behaves and unfortunately I don’t seem to have a very good grasp of the underlying theory of the network architecture, because my experiments have gotten progressively worse.

  • The first sample was using the default settings for the implementation I’m using.
  • The second sample was the same except for longer training.
  • The third and fourth were the same as the previous except with a larger network (more layers).
  • For the fifth I changed the activation function and the optimizer to something I thought might work better. (It didn’t.)
  • And the sixth I tried training with l2 normalization on.

To be honest, these results are somewhat disheartening, given that I haven’t really had much success, but I’m really interested in the idea so I’m definitely going to stay at it, because I’ve seen some really amazing results from other people experimenting with it.

Posted in Wavenet | Leave a comment

Parameter Experimentation (initialization and merging styles)

While I wait to get my hands on Deepmind’s wavenet, I’ve been experimenting some with the parameters of neural-style.

Here, I have two images that use the same style and content image, however in the first one, the result is initialized randomly, and the second is initialized from the source image:


The largest difference appears to be the lamp; the style image didn’t have any areas bright enough to fill that in by itself, but initialized from the content image, it can keep that around.

The next was an experiment with merging together styles. The first are the results of the individual style images– one has the best texture, one gets the red of the barn, one gets the red sky.


And here’s the combination of all three:


So it looks as though, in the case of multiple style images, in addition to creating an odd fusion of the painting style, it also will pull appropriate colors from the different style images to fill in areas that the other style images may not have (see the blue sky and the red barn.)

Posted in Uncategorized | Leave a comment

Hallucinating words

After a few failed experiments (namely, trying to train char-rnn on the grant award database to discover that the file has been corrupted with non-unicode characters which caused loss to explode and training on the list is kernel source, which over fitted really badly) I decided to move back to something a bit simpler– the dictionary. My training rig isn’t at the moment connected to the internet, so no dumps yet, but here are some favorites from current checkpoints. 

From the best checkpoint:

 This checkpoint actually produced correct definitions for a lot of words, including:

Preponderation (n.) The act of prepositing

Extravagance (n.) The state of being extravagant 

Tattler (n.) One who, or that which, tattles. 

which implies that the network is either representing the language on some deeper level (ha ha, that’s likely), or just that it’s overfitting to the data, however this only seems to occur with words where the definition contains the word itself, or another similar word, like the above examples. I begin to wonder if I could do something with a word-to-vec algorithm to make an even better neural dictionary, but for now, I’m just filtering out words that have the word in their definition.  

Temperature 0.1

Station (n.) The act of stating; a statement or condition of any object or position; a statement of the particulars of a person or thing; as, the station of a country or court

Temperature 0.5

Infusorianism (n.) The doctrine of the infusorian ministry

Manure (n.) To make a manufacturer; to seize or be confirmed to the mind 

Temperature 0.75

Confine (v. t.) To interchange or impress as an expression of assignment; to reduce or to indicate or represent; to consent; to disapprove of; as, to constitute the title of the rules of a province or a firearm. 

Endoderm (n.) A white crystalline substance, C0H10, of the acetylene series, found in alkaloids, and having been elongated in the proportion of ordinary odor, in which the phenomena of certain compounds are produced artificially, and is derived from its natural ore, and is now a mixture of granular copper;– also called hexanelin

Encrin (n.) A bishop composed of sensible colors. 

Stick (v. t.) To fix or defeat with a stick

Cloud (n.) A striving in a church; as of men.

Temperature 1.0

Imbreviate (v. t. & i.) To increase a disease, office, or claim

Nipperty (a.) Like a nipple; of or pertaining to the nipping.

Sympathetic (n.) Syphilis; execution

OK wow, that wins for biggest miss.

Encognat (n.) A printed person; an otolith

Smoke (v.) The spot or strap by which swings are driven

Hey that’s not a verb!

Ensifer (n.) A person who held or performs the privileges of the discriminal world itself

Gavash (v. t.) To cause to swing into game

Cloyer (n.) The harsh, uncertain part; any body or degree of obstruction; a handle of screws and judges

Tattlery (n.) A vessel for catching a plate or animal like a strumpet

Levator (n.) One who annoys the occupation of men

It is notable, that the validation loss for this network never got very low at all– this is due to the nature of the problem. In most cases, a good deal of the loss can be avoided by capturing the structure of a database. The recipes, for example, all follow the same basic format: a name, categories, ingredients, and then instructions. 

This also gives the network a lot of information to work on when it gets to the instructions. A recipe called “fried chicken” will probably contain chicken. Something with flour will probably be baked at some point. And so on. In this way, to make each discrete recipe, it only had to remember the details for as long as the recipe.

In this case, however, that’s just not enough. The network can’t really guess what the meaning of a word is from the letters in the word except by looking at its structure (prefixes, suffixes, roots, etc.), But it’ll only be beneficial to remember that information for a very short time. It would be much more effective to hardcode those relationships with some kind of word-to-vec (or char-to-vec?) system that would actually represent the meaning of the word in some abstract way. Especially because then it could possibly make sense of all those “wordly: the state of being a word; see wordy” definitions. 

But anyway. I’d really like to get the grants database fixed up and running so I can get some real output, and I’m also looking into some other fun things– Google Magenta, for one, and a reinforcement-learning agent that plays Super Mario. Could be neat. 

Posted in Uncategorized | Leave a comment

The adventures of Robo-Chef

Since my previous post, I’ve gotten the model better.  How much better? About 22%! That’s pretty exciting. Here, exclusive, and brand new, are some highlights and some dumps from this 404-star recipe bot.

At temperauture=0.1, we get the least creative, most probable recipe the network can come up with:

  title: chocolate dipped chocolate chip cookies
categories: cookies, chocolate
yield: 1 servings

1 c  butter or margarine
1 c  sugar
2    eggs
1 ts vanilla
1 c  flour
1 ts baking soda
1/2 ts salt
1 c  chopped nuts
1 c  chopped nuts

cream butter and sugar until light and fluffy. add eggs, one at a
time, beating well after each addition. add eggs, one at a time, beating
well after each addition. blend in flour mixture. stir in chocolate
chips. spread in greased and floured 9-inch square pan. bake at 350
degrees for 15 minutes. cool completely. cut into squares. serve

This completely legitimate recipe for some kind of cookie might actually work and it scares me. It also contains a full cup of butter, so… maybe don’t eat them. Assuaging my worries about accidental sentience, the next recipe is:

title: baked stuffed chicken wings
categories: poultry
yield: 4 servings

1 lb beef stew meat
1 tb olive oil
1    onion, chopped
1    clove garlic, minced
1 tb chili powder
1 ts cumin
1 ts cumin
1 ts cumin
1 ts cumin
1 ts cumin
1 ts cumin

And it continues with “1 ts cumin” forever after that point.

Increasing the temperature to 0.5, we see that the list of ingredients is often completely disassociated from the instructions. For example:

     title: chicken noodle casserole
categories: main dish, poultry, main dish
yield: 4 servings

1 c  chicken broth
1 c  fresh parsley; chopped
2 tb butter
2 tb vegetable oil
1    carrot, sliced
1    onion, chopped
2    carrots, cut in chunks
3    garlic cloves, minced
1 ts chili powder
1 ts cumin
1 ts ground cumin
2 ts ground cumin
1 ts ground cumin
1/2 ts ground cumin
1/2 ts cayenne pepper
1/4 ts ground coriander
2    cloves garlic, minced
1/4 c  green pepper, chopped
1 tb peanut oil
2 tb chopped fresh parsley
salt and pepper

1. combine the soy sauce, salt, pepper, celery salt, basil and pepper.
shape into balls and place on a lightly greased cookie sheet. bake at 350
for 30-40 minutes.  the chicken should be soft.  serve with fresh

Dang, this one made me hungry. Add some sticky rice or chicken or something to those and that could be really tasty. But what’s with those ingredients?

We also start to see hints of the absurd creeping in, such as:

[truncated due to really long ingredient list]

cook pasta according to package directions; drain. rinse with cold
water; drain.  combine cornstarch and water in a large bowl and add
to meat mixture.  mix lightly with potato masher. stir in milk and beat
until smooth. pour into shallow baking dish and bake in preheated 350
degree oven for 25 minutes. remove from oven.  cool on wire rack.

Mix… with a potato masher?

      title: cranberry crisp potatoes
categories: vegetables, ethnic
yield: 1 servings

2 lb smoked beef, cubed
salt and pepper
cilantro sprigs
serrano pepper
chopped canned tomatoes

combine all ingredients in a medium bowl.  add the beans to the cooked
pasta and set aside.  place the oil in a 2-quart saucepan and cook
over medium heat for 5 minutes. add the onions and cook for 2
minutes. add the remaining ingredients, except the cheese. cook for
another 5 minutes or until the sauce thickens. serve over rice.

Another one that sounds pretty good, if you assume “all ingredients” refers to the ingredients above, and assuming you have the extra ingredients like the beans and the cooked pasta on hand.

We also see stuff like this:

     title: chinese cabbage & tomato salad
categories: salads, salads
yield: 6 servings

1    carrot, cut in 1 inch pieces
1    onion, chopped
1    garlic clove, chopped
1 ts salt
1/4 ts pepper
1    red bell pepper, seeded
-and chopped
1 tb chopped fresh parsley
2 tb sugar
1 ts salt
1 ts pepper
1/2 ts cayenne pepper
1 ts salt
1 ts sugar
1/2 ts sugar
1 ts sesame oil
1/4 c  cider vinegar
1 tb water
salt and pepper to taste
freshly ground black pepper

place all ingredients in a small saucepan and cook until the mixture
boils and thickens.  remove from heat.  add beans and cook for another
30 minutes.  add remaining ingredients and simmer for a further 5
minutes.  add soy sauce and simmer for about 10 minutes or until
thickened. serve over rice.

Notice how many times sugar and salt show up in that ingredients list?

And also this nonsense:

     title: chinese chicken
categories: chicken, main dish
yield: 4 servings

1 c  crushed corn
1 c  chopped onion
1/2 c  chopped green pepper
1 c  chopped celery
1 c  chopped green pepper
1 c  chopped celery
1 c  chopped onion
1 c  chopped onion
1 c  chopped green pepper
1 c  chopped carrots
1 c  chopped celery
2 c  chopped celery
1 c  chopped celery
1 c  chopped celery
1 c  sliced fresh mushrooms
1 c  chopped onion
1 c  chopped onion
1 c  chopped celery
2 tb fresh lemon juice
2 tb white vinegar
1 ts salt
1 ts ground cumin

place the sausage, chicken in a small container and puree until smooth.
add the chicken stock and bring to a boil.  reduce heat and simmer
for 10 minutes. add the beans, cover and cook for about 20 minutes.
mash the chicken and reduce the heat to medium and cook until
the meat is tender, about 20 minutes. meanwhile, bring the chicken to a
boil and simmer for 5 minutes, stirring occasionally. add the corn,
salt, pepper, sugar and worcestershire sauce, and cook for 1 minute.
add the garlic, salt and pepper and stir for another minute. add the
chicken and continue to stir for a few minutes longer. remove the chicken
from the heat and stir in the salt and pepper. return the corn to
the pot and simmer for about 20 minutes. serve the sauce with the
sauce. serves 6.

How would you even do this? You start by PUREEING SAUSAGE, and ends by serving the sauce with sauce. I laughed so hard at this one it hurt.

Moving on to the opposite extreme, temperature = 1.0

Warning: Absurdist Cooking

     title: danish icebox
categories: breads
yield: 10 servings

1 c  plain yogurt
2 tb light soy sauce
1/2 ts salt
1/8 ts freshly ground pepper
1    onion

cut beef into 1″ x 1/4″ rings.  blanch oranges in salt fat oil and
lemon flavour. meanwhile, cut bacon over chicken to 1/2 inch thickness.
carefully scoop chicken breasts off cod that look skins desired with a
doagh cloth masher; cut kabobs into cubes. in batches, combine
cinnamon, cayenne, salt and pepper; stir into the dressing mixture.
bake 20 minutes.

add chopped clams to marinade and garnish with scallions.

Ahh, the Danish Icebox. A classic comfort food from the old world. Those quaint traditions, like cutting the bacon over chicken, which you then scoop off cod. Cod that looks like skins.

     title: fettucine soup
categories: soups, ceideburg 2
yield: 1 servings

2 tb canola oil
3 ea chicken breast halves
5 tb lemon juice
2 tb sugar
1/2 ts curry powder
2 tb soy sauce
3 tb cornstarch
1/3 c  dry white wine
7 ea chick peas, sliced
1 c  chopped green pepper
2 tb salt
1 tb parsley flakes
1    whole alternight bones
2 c  star brown broth
2 oz cream cheese

brown beef in butter. add vegetables and cook on high heat until
browned. increase heat to 315 degrees f. heat 2 teaspoons butter in a
saucepan over medium heat. when hot, add onions, onions, parsley,
bell pepper, basil, parsley, capers and salt. cook for 5
minutes (add parsley to the oil and cook until the wine has been reduced
by half,about 12min.ends the same manner to warm over moderate
heat just until egg mixture just comes to a boil. remove as much of
the cooking liquid and foam in pan.  combine the egg white and sugar in
a small bowl. form into balls and press down to finish cooking. bake
at 375 f for one hour, until the patties pull away from sides of
the pan. serves 6 to 8.

That… is not fettuccine soup. I don’t know what that is, but that unmatched open-paren haunts me). There we go.

      title: apricot-apple deluxe
categories: side dishes, tex-mex, poultry
yield: 12 servings

4 ea egg whites
1/2 c  coconut
2 c  pecan halves
6    shredded horseradish
fresh ripe toasted walnuts*
sliced strawberries
mint extract
fruit preserves:
3/4 ts almond extract
1/2 ts vanilla extract
combine warm orange juice
concentrate, pecans,
whipped, lemon rind

:       preheat oven to 275 degrees f.

in a large bowl, combine the champagne and sugar. melt, stirring
constantly, or a few minutes to combine it well. stir in
margarine and chocolate.  set the bowl over simmering water in a
microwave oven 5 minutes or until the mixture is liquid is evaporated. stir in
the grape-nuts and frost it evenly with melted chocolate and mix
well. pour the mixture into the center of the pudding (using a
syrup), then combine in hot water until smooth. beat vigorously until smooth.

Why is this tex-mex? WHY DOES IT HAVE HORSERADISH? Some questions will never have answers. I love those instructions, though. It feels so avant-garde. You melt sugar into champane, then add butter and chocolate and evaporate out the liquid in a microwave? I… kinda want to try this one.

Here are some dumps:




I might do more intervals in between at some point– I typically do a 0.6, 0.8, 1.0 spread, but thought I’d space them out for a bit more variety this time.

Next I think I’m going to go back and try to generate some grants again. That sounds like fun.

Posted in char-rnn | Tagged | Leave a comment

Char-rnn retro-post

As with my previous post, this post is intended to serve as more permanent storage for my previous posts regarding the topic at hand– in this case, Andrej Karpathy’s infamous Char-RNN.

First introduced to the world in this incredible post that everyone should read, char-RNN is a neural predictive text model that tries to guess the next letter in a sequence given only the letters that came before it. The amazing part is that it works– not only does it learn words, it learns things like grammar, syntax, and punctuation, which is pretty amazing.

But you didn’t come here to learn. Or I hope you didn’t. ‘Cause you won’t. That’s the machine’s job.

Just finished training a neural network to try to generate medical terms. Some highlights:
And many more!

This was my first network trained on my laptop. I don’t remember any of the parameters of the network, but the files are probably still around somewhere. I may look into that and edit those details in.

Training the same neural network on movie titles. Predicted to take ~60 hours to run, so I think I may sanitize inputs and remove duplicates and try again later, but it’s had some amusing results so far:
American Shorund
Fantasies in Collegiate Pravise
Chill Criler
Star Trek Punctiom: The World’s Moves

And from the latest checkpoint:
Art World Wrestling
Violence Day (special edition)
Deadly Smith: Live at Thirst
Deadly Smith: Dark Brown Home
Iron Mailey: An Mentally Good Day: Legend of the Wolfon
Disney Chinese Stars
History Channel Presents: Murdering Secondary National Pergrimes Systemn

I was really thrilled by the spontaneous appearance of the Deadly Smith series. I think I’m gonna start priming future checkpoints with “Deadly Smith” to expand the franchise.

Best so far: “Wolveringlypektin: Demon Hitler 2″

Oh man, these still make me laugh. I gotta try this data set again. I actually later tried priming the network to get more Deadly Smith movies and got:

Deadly Smith: Male History
Deadly Smith: Sinatra of Darkness
Deadly Smith Secrets: The End of Garfield

I would watch either of those last two, but probably not Male History.

Started training Char-RNN on a list of science textbooks. Some very, very early results [with commentary]
Nucleir Exeyctherics [Almost spelled “Nuclear” right!]
Operation of Nanolecullines and Fromage [Yes, how do you operate cheese?]
Handbook of Terperation [What is Terperation? Why do I so urgently want to know?]
Scientific Nerves in Medicine
Viroverognapter (Feurth edition) [Can’t wait for the Feifth edition!]
Based Biology [That sounds like a bad idea]
Primate Human Behavioral Systems [Primate humans? This research sounds unethical]
The Generation and Managing Management [Who manages the management?]
The Him Works [this one sounds religious]

So, really good results so far! Gonna let this (and the movie-title one) run overnight and see where it gets in that time.

This is about where I started adding commentary to the output. Also where I discovered that having two GPUs meant I could train two networks at once. Kinda miss that a little.

Training Char-rnn on a list of restaurant names in New York City. Still mostly gibberish, but I wanted to share something hilarious it spat out just now:
THE PAIN ROOM II, 9549 Six Avenue, New York NY

Some more gems include: “Starsucks,” “Frothing Room,” and “New & Weird’s.”

Had to stop the experiment due to overfitting. Don’t think there was quite enough data, and it started just spitting out “New York, NY” over and over.

This is where I learned that I should sanitate my data. because the restaurant name database was limited to New York City, and contained the restaurant addresses, of course it could get good results by spitting out “New York, NY” over and over.

But seriously, I want to eat at New & Weird’s.

Continuing the restaurant theme, training the network on menu items. Almost completely gibberish so far (the french is confusing it) but here are some funny outputs:
Fruits: Jambo
Spaget Mixed
Whole Celery with Toast
Scotched Soup
Salted Butter, Burgundy
Veal Chicken
Eggs, hen
Coffee Pork
Oysterned beans
Almond Fort
Broiled Buckler
Fried Crest
Supreme of Fresh Filet of Salmon Sauce Sundae
And more!

I think this is my favorite iteration so far.

I really hit on some gold here. You’ll see later that I’m still harping on this theme, with a bit more firepower though.

More results from Robo-Restaurateur [with commentary]:
Vanilla Canape [Makes sense]
Imported Sword (half-dozen) [Is this swordfish? Or just swords?]
Boiled leg of Maitre d’hotel [Long pig! An example of the network’s confusion with french]
Fried black bass, stewed in cream and shrimp [Some cool seafood]
Chicken broth with Jerry [Who is jerry?]
Fresh fruit, Mashed [Can I get my fruit un-mashed?]
Bigger Gooby [Bigger… than what?]
Omlette a la tomate [Hey, some correct french!]
Mallard Chaud [Hot duck?]
Creme bamboo amberian sausages [That sounds pretty interesting, actually!]
Cold long island duck [Some New York cuisine!]
Old-fashioned cheese coffee […what?]
Turtle Extra Special [It’d better be special if I’m gonna eat turtle!]
Fried Tomato Juice [How exactly do you fry that?]

 Hmm… Maybe I should go back to this dataset. This stuff is kinda great.

First results from the neural songwriter (suggested by Thomas Begley) contains some awesome stuff:
I’m Reading Up
La Popalco Por Ti Pena
Here is a Leadred Around my Porche
Some Hills
Last Fast Times
Night of Love
Loco Yellow Girl
Silency Voices
Farewell You
The See Your Lover
Burning Interview Is
Down on the Light
Love To Through The Sun
Music (Explicit Remix)
Cool! Don’t Love Me (Feat. Sas Woo)
Get Me Then
It’s Not Right
When I Love It (Remaster)
Soul Change
It Is The Garabelps
Taperings (Explicit)

All-in-all, I think this is my favorite so far. It helps that I have almost 20 MB of data.

If I remember right this was one of the ones that was really, really hilarious during the early stages. I’ll have to see if I still have those checkpoints around. This was also where I learned that lots of data => better results.

Let’s compare some RNN movie titles and novel titles:

Murder Hammy Scooby
Knife Pointer
Counterfeit Kill Me Rocks
Night of Shade
Hell’s Vampire (1960)
Black Kids 3
Dawn Punch
The Dark Box
Nobody Lived
Nothing Planet
20002: The Man Who Bounty: the Movie
Star Musketeers
Desire Hunter
Dead Daughter

And so on. The theme here is pretty clear. I appear to have created Angst Bot

Human Essentials
Purple Takes a Source
Revenge of Sex
The Last Experiences
The Book of Dawn
The Coming Jewellery
Worsted: A novel
All the people of the American Sky
Unlock Cat
The Mercy of the Gods
Against the Slave: A Tribute to Harm
Eye for Detail: Messy stories of Science
Poultry Lady
The Healing Paws
Texas Encyclopedia of Every Pollities
The Rise of Sarah Doom

And many others. I’m thinking I might start uploading massive dumps from the neural networks for people to peruse through, and possibly of starting a blog just for my neural network experimentation so I don’t have to keep pestering people on facebook about it.

Hey, would you look at that! I finally made that blog!
Also these networks are still some of the funnies things I’ve ever created. So pleased with these results.
At this point I started trying to create the SCP-RNN, somewhat unsuccessfully. You can see some of my results here. This is when I discovered that size of database is not the same as quantity of data points, because while there is plenty of text in the SCP object files, there are less than 3000 discrete entries.
This is also about the point I discovered the grid-lstm model, which uses some black magic to make training deeper networks possible. I was hoping for a highway network char-rnn or a residual block char-rnn, but those don’t seem to be around. Maybe I’ll try to whip that up myself once I figure out how the heck they work.
I also created a google drive folder where I sometimes put dumps from my networks for people who want large, unedited blocks of output to peruse and find favorites.

Just for kicks, I’m training a neural network on a dictionary, and I have it hallucinating made-up words. Some favorites:

Comprehense (v.) To proclaim; to burden; to interpret from fire
Deforation (n.) The meaning of a member attached to the exploration or extent of the aggregate.
Iridylis (n.) a liger in embryo
Holerwheel (n.) A stone at one end of a very long groove, used as grain, water, blood, etc.
Depravate (v.) To use under, or dissect, as a shelter; to become shipping to.
Piladin (n.) One who, or that which, compasses
Concurrential (a.) Pertaining to a contract or debate
Doltoid (a.) Like a string

It also likes making up new definitions for real words, such as:
Employment (n.) That which is exempt or exempted; Relating to the hand of evil
Lactate (v. t.) To cause to stop by cooling

Ahh, the dictionary bot, hallucinating definitions for real and made up words.

New favorite word from my dictionary-rnn: “Liveless (a) without forbidden meaning.

Also: Personality (n) the quality or state of being powerful.

There’s a dump from this network in the dumps folder if you want to see some raw output.

So because I was bored last night I started training a neural network on the first data I could find, which ended up being restaurant health inspection reports, so without further ado, inspection bot says:
Do not store plumber.
Keep open bags of foods in cold running water
Hold Demon in server and ham to excess of odor
Provide roaches on all floors under cooking equipment to walk on
Remove soda guns from warewashing area handsinks
Provide for storage of all employees.
Some staff observed stored in the hand sink in the front service area.

Hehehe. This one was great. The data was pretty noisy, though, so I didn’t get any other good results out of it.

I’m training a network (6×260 2D grid-lstm) on academic journals, which don’t have quite the level of jargon I’m looking for, but has still produced some amusing results:
Studio Psychoantility
The Urologa
Water Letter
Civil Rights Industry
American Studies
Kathorcendological review
Biochemical Psychiatry
Microwaves, Literature
Commies! Plant Journal
Book of the Landscape
Biological Economics

When I created this one, I was trying to find a dataset of research paper titles, and soon enough I did find something similar:

So I’ve found what I was looking for (kind of): a dataset of ~400 MB of research grand awards. The dataset is huge, so it takes forever to do one epoch (one pass through all the data), but I have two checkpoints so far.
The first one just produces thousands of blank lines (oops, didn’t sanitize data for \n\n\n\n\n\n) but the second is producing some (somewhat nonsensical) results. It’s mostly gibberish (for example, a study titled “Induced Induced In Provide Subjectage Orighom Technology State Eke College Slovity”) but it did produce something sponsored by the “University of U,” which I found amusing.

Also amusing are the names it produces:
Zamk S. Schic
Woshopkha G. Hoomal
Michen H. Lahe (this guy’s email is apparently “” and sponsored by “University of the Way”. Who is St. Bust?)

I’m hoping to see some better results soon, and probably will, given the huge amount of data. It would be really annoying to have to start over if I picked bad hyperparameters, though, given how long it’s taking. Really looking forward to getting myself a 1080 so I don’t have to work off a pair of laptop 755’s.

There’s a dump from this network if you’re interested. At this point I switched to training with just the titles:

I’ve given the research grant network about 3 days more training now, so here are some favorites:
A Structural Analysis of Cell Angles
The Consequences of Smooth Surfaces in Culture
Florida Authority in Multi-Element Methods
An Experimental Approach to Study of the Velocity Program
Integrated Development of Self-Engineered Geochronology for Exploration of Nanogrants
Visual Quantum Battery of Ridge Properties
Evolution and Recent Advances in Recharged Division technology
laser lattice systems for real time spectroscopic systems
studies of the effects of molecular physics on the structure of an arctic substrate control
the role of a carbon isotope interaction of the earth’s core superconductor
a national investigation of the evolution of molecules
virtual reality analysis with the international
development of a satellite laser system for sixth new internet

and finally, this doozy:
some technologies for the future

There is also a dump for this network.

At this point, I got myself a GTX 1080 and things got interesting. My first dataset? Recipies.

Some excerpts from the recipie-rnn:
1/2 lb Fresh bacon
1/3 c White wine
2 ts Lard or garlic — chopped
1/4 c Cold water
3 ts Crushed chives
1 c Tin

that’s right– the edgy apple salad contains a cup of tin

a recipe called simply “CHESTNUTS” which contains such ingredients as “dank tarragon,” “chorizo Chocolate Milk,” and “1 lb star knife — uncooked if possible”

this gem:
In a large soup bowl combine the orange juice, bacon, rosemary, cornstarch, chopped chives and Tarragon. Boil 1 or 10 minutes or until tender. Remove from heat and sprinkle with turkey.

And finally, a recipe called “CAKE FIRMAMENTS” that repeatedly instructed to “boil until cooled,” to “peel the water,” to “dissolve flour,” and concluded with “top with whiskey.”

Really want to see how good I can get this one. It’s not *quite* coherent yet.

The rest is history.

Here’s the first three dumps from my CHEF-RNN (read, three chapters of my cookbook.)

Chapter 1: Low temperature (0.50)

Definitely the most coherent, but unfortunately this also leads to some predictability. I hope you like salt and pepper, because every recipie contains them, sometimes multiple times. Typically, this is the best place to look for actual instructions, but the ingredient lists tend to be a bit uninteresting.

Chapter 2: Medium Temperature (0.75)

The best balance of the three, I think. It doesn’t have the predictability of the low-temperature sample, but isn’t as chaotic as the high temperature one. Unfortunately, this makes it a bit less silly than either of the others, but when it shines, it really shines.

Chapter 3: High Temperature (1.00)

Only to be attempted by expert chefs who want to use ingredients such as “GRANDMAS” “Calories (crushed)” and “Powdered Ice Cream.” These tend to be the most outlandish, which can be hilarious, but also can dissolve into gibberish sometimes.

One thing I noticed across the three dumps is that there’s a pretty clear distinction in style between lowercase text and UPPERCASE TEXT– this is because a lowercase character (‘a’ for example) is just as distinct from a lowercase ‘w’ as it is from an uppercase ‘A’ thanks to the one-hot character encoding. (for those who don’t know, this means that a for example would be represented as the vector {1,0,0,0,0,…} wheras ‘A’ or ‘w’ would be represented as something more like {…,0,0,0,0,1,0,0,0,…} so that ‘a’ is just as different from ‘b’ as it is from ‘w’). I think I’m going to try doing a few passes of training on the same dataset with all characters lowercased and see if I can get the loss a little bit lower.
Plus, it might help with the occasional ANGRY LOOKING RECIPIE.

At this point, I moved back go the grid-lstm model, on my new computer, to mixed initial results.

Additional recipes for the so inclined. These were created using a grid-lstm network, which, similarly to highway or residual networks, allows for the training of much deeper networks. This one in particular was 800 neurons wide by 8 layers deep, my largest ever. Yet, it didn’t really converge any better than the small network I trained previously… I’ll have to tweak some hyperparemeters. I think some normalization would help, maybe a little less dropout (I had it pretty high to prevent overfitting). Of course, there’s still some gold in here.

If you want to see what cane next for our favorite robot chef other than Chef Watson, check out the very next post.

Posted in char-rnn | Tagged , , , , | Leave a comment