Binary Classification: Twitter sentiment analysis

December 24, 2016
This experiment demonstrates the use of the Execute R Script, Feature Selection, Feature Hashing modules to train a text sentiment classification engine.
In this article, we'll explain how to to build an experiment for sentiment analysis using Microsoft Azure Machine Learning Studio. Sentiment analysis is a special case of text mining that is increasingly important in business intelligence and and social media analysis. For example, sentiment analysis of user reviews and tweets can help companies monitor public sentiment about their brands, or help consumers who want to identify opinion polarity before purchasing a product. This experiment demonstrates the use of the Feature Hashing, Execute R Script and Filter-Based Feature Selection modules to train a sentiment analysis engine. We use a data-driven machine learning approach instead of a lexicon-based approach, as the latter is known to have high precision but low coverage compared to an approach that learns from a corpus of annotated tweets. The hashing features are used to train a model using the Two-Class Support Vector Machine (SVM), and the trained model is used to predict the opinion polarity of unseen tweets. The output predictions can be aggregated over all the tweets containing a certain keyword, such as brand, celebrity, product, book names, etc in order to find out the overall sentiment around that keyword. The experiment is generic enough that you could use this framework to solve any text classification task given a reasonable amount of labeled training data.