Compare Multi-class Classifiers: Letter recognition
This sample demonstrates how to compare multiple multi-class classifiers using the letter recognition dataset.
##Compare Multi-class Classifiers: Letter Recognition
This sample demonstrates how to create multiclass classifiers and evaluate and compare the performance of multiple models.
##Data
For this experiment, we use the letter image recognition data from the [UCI repository](http://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition). The first column is the label, which identifies each row as one of 26 letters, A-Z. The remaining 16 columns are feature columns. The dataset contains 20000 instances. Description and other details about the data can be found at [http://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.names](http://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.names).
![][image1]
For this experiment, we used the **Split** module to randomly divide the dataset using a 80-20 ratio of training data to test data. Then we trained multiple models using the **Train Model** module with the training dataset as input.
![][image2] ![][image3] ![][image4]
##Models
We decided to compare four _multi-class classification algorithms_ provided in Azure ML Studio: **Multiclass Neural Network**, **Multiclass Decision Jungle**, **Multiclass Logistic Regression**, and **Multiclass Decision Forest**.
![][image5]
Azure ML Studio also provides a module called **One-vs-All Multiclass** which can use any binary classifier as an input to solve a multi-class classification problem, based on this [one-vs-all](http://en.wikipedia.org/wiki/Multiclass_classification#One-vs.-rest) method. Therefore, as the fith model for comparison in our letter recognition task, we used a binary classification module, **Two-Class Support Vector Machine**, and connected it to the **One-vs.-All Multiclass** module.
![][image6]
##Results
We used the **Score Model** module on each combination of a trained model and the test data, and then used the **Evaluate Model** module to compute the confusion matrix of results. To view the confusion matrix, just right-click the output port of the **Evaluate Model** module and select **Visualize**. Because **Evaluate Model** has two input ports, you can compare the confusion matrices for two different models side-by-side, by connecting the optional second port to the output of another **Score Model** module.
![][image7]
Next, we used custom R code in an **Execute R Script** module to compute the _macro precision_ and _macro recall_ for the individual models. We used a series of **Add Rows** modules to combine those results. The output of the final **Add Rows** module shows thecompleted results datset.
From these results, we can see that the model created by using **Multiclass Decision Forest** has the best macro precision, while the model created by using **Multiclass Neural Network** has the best macro recall (although only marginally better than the decision forest model).
![][image8]
We used **Visualize** on the output of **Evaluate Model** to review the confusion matrix. However, if we want to obtain the the raw data to work with the prediction results, we can use an **Execute R Script** module to pass the same data through without any modification and visualize using the module's output port.
In the experiment, we did this only for the branch containing the **Multiclass Neural Network** model, to ilustrate how you can see the prediction results as a count. These numbers can be further used for computing other metrics such as **micro precision**, **micro recall** etc.
![][image9]
If we scroll to the right side of the visualization panel we can also see these metrics for each class: **Average Log Loss**, **Precision**, and **Recall**.
![][image10]
<!-- Images -->
[image1]:http://az712634.vo.msecnd.net/samplesimg/v1/18/data.PNG
[image2]:http://az712634.vo.msecnd.net/samplesimg/v1/18/read_split_graph.PNG
[image3]:http://az712634.vo.msecnd.net/samplesimg/v1/18/read.PNG
[image4]:http://az712634.vo.msecnd.net/samplesimg/v1/18/split.PNG
[image5]:http://az712634.vo.msecnd.net/samplesimg/v1/18/models.PNG
[image6]:http://az712634.vo.msecnd.net/samplesimg/v1/18/svm.PNG
[image7]:http://az712634.vo.msecnd.net/samplesimg/v1/18/conf_mat.PNG
[image8]:http://az712634.vo.msecnd.net/samplesimg/v1/18/macro_avgs.PNG
[image9]:http://az712634.vo.msecnd.net/samplesimg/v1/18/conf_mat_num.PNG
[image10]:http://az712634.vo.msecnd.net/samplesimg/v1/18/per_class_metrics.PNG