Sentiment Analysis: Daily Tweet
Daily Tweets on feeling, collected using Azure Streaming Analytics and analyzed for various sentiments using multiple R-packages.
Below outlines process of extracting tweets, cleaning tweets, evaluating sentiment scores for each tweet and as well as calculating 10 emotional attributes from Syuzhet package using various of Azure components.
- Data
The Twitter dataset was collected daily using ASA job with Event Hub as the source and Blob storage as the sink. The collected dataset was stored in a Blob Storage in CSV format and downloaded using Azure Storage Explorer.
Various Twitter handles were used for daily collecting sentiments for each day over a period of 1 week. ASA jobs were run daily for over a period of 1 hour to collect the info. For example, for Monday, handles such as (Monday, MondayMorning, mondaythoughts, MondayMood, MondayMotivation) were collected for analysis.
- Retrieve Data
Over of 25k collected and the following is the Azure Streaming Analytics Query Language (ASAQL) used to for the Twitter Stream
*SELECT [CreatedAt[, [Topic], [SentimentScore], [Author], [Text] FROM
EventHubInputName*
Additional features were added to facilitate easier selection of info for R-scripts
- Prepare Data
No preparation of the info was done in Azure Machine Learning (AML)
- Preprocess Data
None of the AML modules pertaining to feature selection or preprocessing were employed inside the Azure Machine Learning
- Algorithm
In an effort to study various sentiments expressed, the following R-packages were employed
*SentimentAnalysis: uses analyzeSentiment
Syuzhet: uses get_nrc_sentiment emotion lexicon
tm: various functions were utilized to remove stopwords, url. Whitespace and as well as restrict to particular encoding for this analysis.*
- Results
Two Execute R-Script modules were utilized to calculate and plot using the above packages. Each of the plots compares the sentiments across each day