Text Classification: Compare Built-in Text Analytics modules vs Custom R Script

June 1, 2017
Compare Built-in Text Analytics modules vs Custom R Script
The goal was try to reproduce the [Experiment of Text Classification by AzureML Team][1] using recent modules in category Text Analytics such as **Preprocess Text** and **Extract N-Gram Features from Text** and compare the results. N-grams TF Model ------------------- ***Custom R Script*** - *AUC* - **0.836** - *F1 Score* - **0.762** - *Accuracy* - **0.758** ***Build-in Module*** - *AUC* - **0.857** - *F1 Score* - **0.785** - *Accuracy* - **0.782** Unigram TF-IDF Model -------------------- ***Custom R Script*** - *AUC* - **0.822** - *F1 Score* - **0.755** - *Accuracy* - **0.745** ***Build-in Module*** - *AUC* - **0.745** - *F1 Score* - **0.692** - *Accuracy* - **0.673** As you can see in Unigram TF-IDF Model the result of Buil-in Module is much worse than Custom R Script. I think the reason is in parameters that I fill in module **Extract N-Gram Features from Text**. Maybe someone know how to make model work better (which parameter should I change)? [1]: https://gallery.cortanaintelligence.com/Collection/Text-Classification-Template-1