Track sentiment of tweets from live twitter stream api
describe the changes made to add sentiment analysis for tweets in a C++ application
## The result
This demonstrates the integration of Azure Machine Learning into the sentiment analysis of live tweets in an existing application. The code for the application is on [github](https://github.com/kirkshoop/twitter)
## Sentiment analysis of tweets
Sentiment Analysis takes some text and evaluates the text to determine the sentiment. In this case the result indicates whether the tweet is positive, negative or neutral in tone. Sentiment can be useful when evaluating tweets. Associating the sentiment of tweets to the words in those tweets allow the app to show the history of positive vs. negative usage of each word.
## Azure Machine Learning twitter sentiment sample
The Cortana Intelligence Gallery has this [experiment](https://gallery.cortanaintelligence.com/Experiment/Binary-Classification-Twitter-sentiment-analysis-4) that describes how it builds a model to evaluate the sentiment of tweets. The experiment has a detailed explanation of how the sentiment model is built and the parameters used for the transforms and algorithms used to generate the model.
> This experiment demonstrates the use of the __Feature Hashing__, __Execute R Script__ and __Filter-Based Feature Selection__ modules to train a sentiment analysis engine. We use a data-driven machine learning approach instead of a lexicon-based approach, as the latter is known to have high precision but low coverage compared to an approach that learns from a corpus of annotated tweets. The hashing features are used to train a model using the __Two-Class Support Vector Machine__ (SVM), and the trained model is used to predict the opinion polarity of unseen tweets.
I actually used the two experiments in this [collection](https://gallery.cortanaintelligence.com/Collection/Twitter-Sentiment-Analysis-Collection-1) to create my model and web service. The experiments in this collection have some updates that made them easier to use.
## Training a model
The first experiment ([Training Experiment for Twitter sentiment analysis](https://gallery.cortanaintelligence.com/Experiment/Training-Experiment-for-Twitter-sentiment-analysis-2)) in the collection is used to train the model.
> An Azure Machine Learning Workspace is required to use the experiment and create a web service. Paid and free Workspaces are available. The copy process will prompt for sign-in and the creation of a workspace.
- Copy this experiment into an Azure Machine Learning Workspace by clicking the 'Open in Studio' link in the [Training Experiment for Twitter sentiment analysis](https://gallery.cortanaintelligence.com/Experiment/Training-Experiment-for-Twitter-sentiment-analysis-2).
- The training experiment must be 'Run' to create the model.
- Once the run completes the model and column transform results must be saved in the workspace. These are used by the second experiment.
- Right+Click the 'Train Model' node and select 'Trained Model' > 'Save As Trained Model'
- Right+Click the 'Select Columns Transform' node and select 'Columns selection transformation' > 'Save As Transform'
<nbsp/>
## Creating a web service from the model
The second experiment ([Predictive Experiment for Twitter sentiment analysis](https://gallery.cortanaintelligence.com/Experiment/Predictive-Experiment-for-Twitter-sentiment-analysis-3)) in the collection is used to create a web service for the model.
This experiment refers to the saved model and transform from the training experiment.
- Copy this experiment into an Azure Machine Learning Workspace by clicking the 'Open in Studio' link in the [Training Experiment for Twitter sentiment analysis](https://gallery.cortanaintelligence.com/Experiment/Training-Experiment-for-Twitter-sentiment-analysis-2).
- The predictive experiment must be 'Run' before a web service can be deployed.
- Once the run completes, deploy the web service.
<nbsp/>
## Using the web service
The [Web service portal](https://services.azureml.net) lists deployed web services.
When viewing the deployed web service there will be a 'Test' tab and a 'Consume' tab.
The Test tab will allow arbitrary text to be sent to the web service. These requests are made from the browser, so the browser debugger can be used to observe the request and response.
The Consume tab displays the url and key that will be used to make calls to the web service. These will be used in the application.
## Making calls to the web service
The application uses [rxcpp](https://github.com/Reactive-Extensions/RxCpp) and [libcurl](https://curl.haxx.se/libcurl/) to make http requests.
The key for the web service is added to the 'Authorization' header as a 'Bearer' auth token. The json body to send is produced from an array of strings containing the tweets.
std::map<string, string> headers;
headers["Content-Type"] = "application/json";
headers["Authorization"] = "Bearer " + key;
auto body = json::parse(R"({"Inputs":{"input1":[]},"GlobalParameters":{}})");
static const regex nonascii(R"([^A-Za-z0-9])");
auto& input1 = body["Inputs"]["input1"];
for(auto& t : text) {
auto ascii = regex_replace(t, nonascii, " ");
input1.push_back({{"tweet_text", ascii}});
}
return observable<>::defer([=]() -> observable<string> {
return factory.create(http_request{url, "POST", headers, body.dump()}) |
rxo::map([](http_response r){
return r.body.complete;
}) |
merge(worker);
}) |
subscribe_on(worker);
## Batching calls to the web service
Making a separate http request per Tweet is excessive. Here is code that uses [rxcpp](https://github.com/Reactive-Extensions/RxCpp) to make calls for more than one tweet at a time.
The following records Tweets into a vector for 500ms and then ignores empty vectors.
buffer_with_time(milliseconds(500), tweetthread) |
filter([](const vector<Tweet>& tws){ return !tws.empty(); }) |
This code uses [range-v3](https://github.com/ericniebler/range-v3) on the vector to extract the text of each tweet.
vector<string> text = tws |
ranges::view::transform([](Tweet tw){
auto& tweet = tw.data->tweet;
return tweettext(tweet);
});
`sentimentrequest()` creates the http request to the web service.
sentimentrequest(
poolthread,
factory,
settings["SentimentUrl"].get<string>(),
settings["SentimentKey"].get<string>(),
text) |
[range-v3](https://github.com/ericniebler/range-v3) is used on the http response to pair up each tweet with its sentiment.
auto combined = ranges::view::zip(response["Results"]["output1"], tws);
## Tracking word sentiment
Sentiment information for each tweet is nice. The next step is to count how many times each word appeared in a negative tweet and how many times the same word appeared in a positive tweet.
The word 'love' is consistently shown to be in the top 2 words across all tweets. After adding this code the app shows that 'love' occurs 8x more in positive tweets then in negative ones.
Other notable words:
- 'work' is 3x more likely to be used in a negative tweet
- 'wanna' is 8x more likely to be used in a negative tweet
- 'thank' is 3x more likely to be used in a positive tweet
<nbsp/>
## Collecting sentiment per word
Adding maps to the Model that hold counts of positive tweets and negative tweets that contain each word was simple.
This code was added to the `Reducer` produced when a sentiment call completes.
for (auto& word: get<1>(b).data->words) {
sentiment["Sentiment"] == "negative" && ++m.data->negativewords[word];
sentiment["Sentiment"] == "positive" && ++m.data->positivewords[word];
}
## Display sentiment per word
Showing the actual counts for the top words is good. This code does that using the [Dear, ImGui](https://github.com/ocornut/imgui) library, and also displays a fraction that shows the relative frequency for the word between positive and negative tweets. The relative frequency was used to derive the data about 'wanna' above.
auto positive = m.positivewords[w.word];
ImGui::TextColored(positivecolor, " +%4.d,", positive); ImGui::SameLine();
auto negative = m.negativewords[w.word];
ImGui::TextColored(negativecolor, " -%4.d", negative); ImGui::SameLine();
if (negative > positive) {
ImGui::TextColored(negativecolor, " -%6.2fx", negative / std::max(float(positive), 0.001f)); ImGui::SameLine();
} else {
ImGui::TextColored(positivecolor, " +%6.2fx", positive / std::max(float(negative), 0.001f)); ImGui::SameLine();
}
## Composition
There was very little effort in taking the Azure Machine Learning sample experiments and using them to create a web service that evaluates the sentiment of tweets.
There was very little code to write to create a new http request for all the tweets that arrived in a 500ms window and then split the sentiment results out and pair them with the tweet.
The result is that the tweets are displayed when they arrive and then marked with the sentiment when it arrives later. The effect is quite pleasing to watch.
[rxcpp](https://github.com/Reactive-Extensions/RxCpp) provided an easy way to express the coordination between the arriving tweets and the arriving sentiments and [range-v3](https://github.com/ericniebler/range-v3) provided an easy way to manipulate the vectors. Each contributed a powerful abstraction that made a complex problem easy.