Expanding from Binary to Multi-Label Text Classification

Matt Schwartz
May 24, 2020

When we started SocialSentiment.io we focused entirely on sentiment for our social media post analysis. Using thousands of real posts as training inputs to our neural networks we got back predictions as floating point numbers between 0 and 1. Each post was calculated to be leaning negative (toward 0) or positive (toward 1) by our machine learning model. We also trained with neutral posts leading to predictions around 0.5. These predictions are aggregated along with post popularity to calculate social sentiment scores for publicly traded stocks.

The downside of this simple approach is that off-topic posts are sometimes included, and can result in predictions that are not always neutral. This week we've transitioned from our original binary (or really trinary) model to a multi-label text classification model. Not only does this allow us to train and ignore off-topic social media posts, it helps us expand into labeling various topics or categories of posts all within one machine learning model.

We are now able to train and actively ignore social media that does not represent people's opinions of a stock or a company's products. Our stock sentiment graphs are therefore more accurate and more truly represent the overall sentiment of the companies we watch on social media.

We're looking forward to gathering more categories of information, particularly from Twitter posts. We can now discern posts about stock value, typically from stock traders and analysts, from complaints or fan posts about a company's products. Both are valuable in analyzing a company, of course, but in different ways. We're watching and playing with the various text classifications we are able to achieve today with our machine learning algorithms. We're looking forward to finding the most useful ways to share them with you in the future.