Text Classification: Compare Built-in Text Analytics modules vs Custom R Script
Compare Built-in Text Analytics modules vs Custom R Script
The goal was try to reproduce the [Experiment of Text Classification by AzureML Team][1] using recent modules in category Text Analytics such as **Preprocess Text** and **Extract N-Gram Features from Text** and compare the results.
N-grams TF Model
-------------------
***Custom R Script***
- *AUC* - **0.836**
- *F1 Score* - **0.762**
- *Accuracy* - **0.758**
***Build-in Module***
- *AUC* - **0.857**
- *F1 Score* - **0.785**
- *Accuracy* - **0.782**
Unigram TF-IDF Model
--------------------
***Custom R Script***
- *AUC* - **0.822**
- *F1 Score* - **0.755**
- *Accuracy* - **0.745**
***Build-in Module***
- *AUC* - **0.745**
- *F1 Score* - **0.692**
- *Accuracy* - **0.673**
As you can see in Unigram TF-IDF Model the result of Buil-in Module is much worse than Custom R Script. I think the reason is in parameters that I fill in module **Extract N-Gram Features from Text**. Maybe someone know how to make model work better (which parameter should I change)?
[1]: https://gallery.cortanaintelligence.com/Collection/Text-Classification-Template-1