Sample 7: Train, Test, Evaluate for Multiclass Classification: Letter Recognition Dataset

By for October 2, 2014
Sample experiment that uses multiclass classification to predict the letter category as one of the 26 capital letters in the English alphabet.
#Multiclass Classification: Letter Recognition This experiment demonstrates how to build a multiclass classification model for letter recognition, using Azure ML Studio. ##Workflow Below is a screenshot of the entire experiment in Azure ML ![screenshot_of_experiment]( There are five basic steps to build this experiment. - [Step 1: Get Data](#anchor-1) - [Step 2: Pre-process Data](#anchor-2) - [Step 3: Define Features](#anchor-3) - [Step 3: Train Model](#anchor-4) - [Step 4: Score and Evaluate model](#anchor-5) ------------------------------------------ <a name="anchor-1"></a> ##Data We will use the letter recognition data from UCI Machine Learning repository ( ). In this data, a set of 20,000 unique letter images was generated by randomly distorting pixel images of the 26 uppercase letters from 20 different commercial fonts. The parent fonts represented a full range of character types including script, italic, serif, and Gothic. Each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. The features of each of the 20,000 characters or stimulus were summarized in terms of 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. The objective is to identify each black-and-white rectangular pixel display as one of the 26 capital letters in the English alphabet. [P. W. Frey and D. J. Slate. "Letter Recognition Using Holland-style Adaptive Classifiers". (Machine Learning Vol 6 #2 March 91)] In order to access this dataset, drag the **Reader** module to the experiment canvas. This module can be used to specify the data source for an experiment. ![screenshot_of_experiment]( For this experiment, select the option **Web URL via HTTP** and provide the URL and data format. ![screenshot_of_experiment]( The uploaded data contains 20000 rows and 17 columns. `Col1` contains the label, which indicates which of the 26 classes (the letters A-Z) is represented by the numerical attributes in columns 2-17. <a name="anchor-2"></a> ##Data Preparation In general, when preparing data for an experiment, you might need to perform tasks such as selecting a subset of relevant columns from the entire dataset, changing the data type, converting columns to categorical or continuous variables, taking care of missing values etc. Additional tasks might include normalizing the data or binning values, if necessary. In this experiment, the uploaded data is already in a good format, so no pre-processing of the data is needed. <a name="anchor-3"></a> ##Define Features Defining a good set of features for a predictive model requires experimentation and knowledge about the problem at hand. Some features are better for predicting the target than others. Also, some features have a strong correlation with other features, so they will not add much new information to the model and can be removed. When building a model we might use all the available features in the dataset, or we might use only a subset of the features. For this experiment, we will use all 16 columns (Col2-Col17) that contain features of the 20,000 characters. <a name="anchor-4"></a> ##Train Model The next step is to train a model. First, split the data into training and testing sets. Select and drag the **Split** module to the experiment canvas and connect it to the second input port of the **Train Model** module. Set **Fraction of rows in the both output dataset** to 0.5. This way, we'll use 50% of the data to train the model, and the other 50% for testing. ![screenshot_of_experiment]( >**Tip** - By changing the **Random seed** parameter, you can produce different random samples for training and testing. This parameter controls the seeding of the pseudo-random number generator. In this experiment we will use two different algorithms. Drag the following modules into the experiment workspace: **Multiclass Decision Jungle**, and the **Two-Class Support Vector Machines** module connected to the **One-vs-All Multiclass** module. Connect the output of the algorithm module to the input ports of the **Train Model** module, along with the training data. ![screenshot_of_experiment]( We are ready to run the experiment. The result is a trained classification model that can be used to score new samples to make predictions. <a name="anchor-5"></a> ##Score and Evaluate the Model Now that we've trained the model, we can use it to score the other 50% of the data and see how well the model predicts and classifies new data. Add the **Score Model** module to the experiment canvas, and connect the left input port to the output of the **Train Model** module. Connect the right input port to the test data output (right port) of the **Split** module. Run the experiment and view the output of the **Score Model** module, by clicking the output port and selecting **Visualize***. The output shows the the scored labels and the probabilities for each of the 26 classes ("A"-"Z"). Finally, to test the quality of the results, drag the **Evaluate Model** module to the experiment canvas, and connect the left input port to the output of the **Score Model** module. There are two input ports because the **Evaluate Model** module can be used to compare two models. Therefore, we can compare the performance of the two different algorithms that we applied -- **Multiclass Decision Jungle** and the **Two-Class Support Vector Machines** used together with the **One-vs-All Multiclass** modules. Run the experiment and view the output of the **Evaluate Model** module, by clicking the output port and selecting **Visualize**. The following diagram shows the resulting statistics for the model. ![screenshot_of_experiment]( ##Results Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. In this confusion matrix, for class A, the recall is 85.1%. The algorithm correctly classifies class A 85.1% of the time whereas class A was misclassified as belonging to class G 0.5% of the time. The confusion matrix also does a poor job of classifying G and O.