In exploring the recipe dataset, I decided to have some fun with Word2vec, an algorithm originally created by Google. For the layperson, this algorithm works by looking at the context in which a given word appears and learns vectors to represent words such that words that appear in similar context have similar vectors. On the recipe dataset, this means that, for example, the vectors for vodka and cognac are very close together, wheat and rye are very close, chocolate and butterscotch are very close together, etc.
What’s really neat about this, though, is that it enables us to do some very interesting things. One of the properties of the vectors created is the ability to perform vector arithmetic, adding and subtracting these semantic vectors to create word analogies. Here are a few examples: (read a – b + c = d as “b is to a as c is to d”)
pie – pizza + calzone = blintz
That makes sense! Never would have thought of that to be honest.
banana – plantain + apple = blueberry
I guess an apple is just a big blueberry. Who knew
candy – marshmallow + coffee = espresso
I guess that makes sense. Weird though.
fish – tuna + chocolate = candy
Ok, tuna is a type of fish, chocolate is a type of candy. I guess I’ll let that one slide.
coffee – tea + lemon = orange
tea – coffee + lemon = lime
That’s interesting. It seems to think coffee is sweeter than tea.
coffee – knife + spoon = expresso
Interesting. In addition to the marshmallow of candy, it’s also the spoon of cutlery.
rasin – grape + fish = offal
Wow, ok, I guess it doesn’t like rasins.
brie – cheese + candy = meringues (closely followed by “fondant”)
Makes sense. Fancy, soft, light.
ribbon – bar + dome = tented
Let’s try the classic word2vec analogy:
king – man + woman = bruce
What. The next closest option is “retired.”
I’m going to continue experimenting with this. I’ve also been getting some really good results with the chef-rnn, so I’ll get back to you with more of that soonish as well.