Training Experiment for Twitter sentiment analysis

May 22, 2016
This experiment demonstrates the use of the Execute R Script, Feature Selection, Feature Hashing modules to train a text sentiment classification engine.
In this article, we'll explain how to to build an experiment for sentiment analysis using [Microsoft Azure Machine Learning Studio](http://studio.azureml.net). **Sentiment analysis** is a special case of text mining that is increasingly important in business intelligence and and social media analysis. For example, sentiment analysis of user reviews and tweets can help companies monitor public sentiment about their brands, or help consumers who want to identify opinion polarity before purchasing a product. This experiment demonstrates the use of the **Feature Hashing**, **Execute R Script** and **Filter-Based Feature Selection** modules to train a sentiment analysis engine. We use a data-driven machine learning approach instead of a lexicon-based approach, as the latter is known to have high precision but low coverage compared to an approach that learns from a corpus of annotated tweets. The hashing features are used to train a model using the Two-Class Support Vector Machine (SVM), and the trained model is used to predict the opinion polarity of unseen tweets. Also, in this experiment we add a "select column transform" to the output of filter based feature selector to keep same feature set of training and predicting. The output predictions can be aggregated over all the tweets containing a certain keyword, such as brand, celebrity, product, book names, etc in order to find out the overall sentiment around that keyword. The experiment is generic enough that you could use this framework to solve any text classification task given a reasonable amount of labeled training data. ![](http://neerajkh.blob.core.windows.net/images/TwitterSentimentTrainingExperimentCapture.png)