Making Robots Happy

Machine learning is super-cool. I’ve spent some free time over the last year trying to write a good way to analyze sentiment on Twitter. I started by picking out positive and negative words, but that didn’t work too well, especially without any corpus of positive and negative words to go off of. So, before winter break last semester, I wrote a tool that used a naive Bayesian classifier to do the sentiment analysis.

Which sounds pretty intense if you’re not into CS, but it really isn’t.

Basically, what it does is look at tweets that people have already categorized as positive or negative to learn from them. It then makes a guess for a new tweet based on how holistically similar it is to those old tweets. There’s a lot more to it than that, of course (for the technically-inclined, I took a lot of ideas from this guy’s suggestions, which were AWESOME), but that’s basically how it works.

So, here’s an example. It gets weird data points from time-to-time, but is mostly pretty accurate. I chose to look at SOPA sentiment on Twitter, so I expected pretty negative results.