A very common problem in natural language processing is that of sentiment analysis, or rather, attempting to analytically determine if a given input has an overall positive or negative tone. For example, a positive string might be “I really enjoyed this movie!” whereas a negative input might be “it was boring, vapid, and dull.”
My idea is essentially thus: given recent advancements in machine translation, it should be possible to translate between sentiments (as opposed to between languages. The difficulty in doing this presented itself fairly immediately: there is no dataset available with a one-to-one mapping between negative and positive sentiments. There are models for unpaired machine translation, but they’re still relatively unproven.
My first implementation was a fairly simple rule-based approach: try to remove or add inverters (the word “not,” for example) and replace words with their antonyms until the sentiment changes appropriately. This worked well for very simple cases, but just wasn’t smart enough to capture more complex relationships (for example, it loved to translate “good” to “evil,” even when “bad” would make a lot more sense). My new implementation takes a different approach, using (and abusing) a model loosely adapted from Daniele Grattarola’s Twitter Sentiment CNN.
I used the aclimdb dataset, a set of reviews scraped from the International Movie Database, split into four parts of ~12500 reviews each: positive training, negative training, positive test, and negative test. Movie reviews work very well for this problem because they are essentially already annotated with the sentiment in the form of the user’s star rating for the film.
In pre-processing, I split the reviews into sentences to reduce the length of each input and convert each review in to word vectors (in my experiments, I used the googlenews-300 pretrained vectors). Unfortunately, due to the size of the input when converted into 300-dimensional vectors, I frequently ran out of memory during training. To reduce this issue, I only load the million most common words in the google news negative 300 vectors
The model is based on a set of one-dimensional convolutions over the series of word vectors, followed by a max pooling layer, ReLU, and a fully-connected layer. This is trained as a standard sentiment classifier, learning to predict the sentiment of a given input sentence.
At sampling time, however, we do something different. We run the input sentence through the classifier, as normal, however we give the classifier a different target sentiment. We then find the gradient of the loss of the classifier with respect to the input word vectors. This may be familiar to anyone who’s implemented Google’s Deep Dream algorithm or worked with Adversarial images. In essence, this will give us the direction we should perturb the input vectors to cause the largest change in the sentiment. Additionally, the magnitude of the gradient for a given word roughly corresponds to how much that word contributes to the sentiment (and therefore, how important it is to change).
We hit on another problem here. The space of possible words is discrete, but the word vector space is continuous (and very high-dimensional, and thus sparse). How can we be sure that these gradients are moving towards an actual word? To be honest, I’m not entirely sure. My first approach was to use multiple gradient steps, however this appeared to find minima in the sentiment that didn’t correspond to actual words in the input set. My second approach was to extend the gradient out as a ray from the original word and find the word vectors closest to this line: this worked a good deal better (specifically, it captures the “hate” <-> “love” relationship), but still isn’t perfect: we still need an heuristic method to select which of the proposed word replacements to use, which in the end will make this method little better than the rule-based approach from my initial implementation.
The biggest realization I came to was that when mapping a discrete space to a continuous space, the meaning of the intermediate values is not always intuitive. This is what we see when we simply perform gradient descent on the word vectors– the sentiment converges very nicely, but the resulting vectors have barely changed from their original values. This is interesting in the domain of computer vision, as it typically results in an “adversarial image,” or an image which can fool the classifier into misclassifying it with a very high confidence while being indistinguishable from the original to a human. However, as we are hoping for some of the words to converge to different discrete values in the word space, this is less than ideal.
Additionally, one unanticipated disadvantage of the lack of unpaired data was the inability to mathematically verify the accuracy of the translations– there was no ground truth to translate to.
One thought I’ve had is to try to do something similar to CycleGAN, which performs “image translation” in an unpaired fashion through a combination of GAN loss and “reconstruction loss,” however this still introduces problems as we cannot easily calculate gradients of the sentiment loss through the discretization into the word space.
It’s a tricky problem, but if anyone has any ideas, I’m interested.