Compare Binary Classifiers

By for September 2, 2014
Sample experiment that shows how to compare performance of multiple learning algorithms.
#Comparing binary classifiers This experiment shows how to compare performance of different binary classifiers. ![Reader][experiment] ## Dataset and problem description We use 'Adult Census Income' dataset, with 'income' being a label column. This column indicated whether person's income exceeds $50K/yr. The dataset is prepopulated in your workspace and is originally downloaded from [UCI Repository](http://archive.ics.uci.edu/ml/datasets/Adult). ## Data preprocessing We replace all missing values with "0" using **Missing Value Scrubber** module. Initially we use **Split** module to randomly split the dataset into training and test sets. ## Comparison of classifiers We compare 4 binary classifiers: **Two-Class Averaged Perceptron**, **Two-Class Bayes Point Machine**, **Two-Class Decision Jungle** and **Two-Class Locally-Deep Support Vector Machine**. The comparison is done by performing the following steps: 1. 3-fold cross-validation over the training set 2. finding the best hyperparameters of each learning algorithm 3. training each learning algorithm over the training set using previously found algorithm's best values of hyperparameters 4. scoring the test set 5. computing accuracy over the test set The first three steps are done by **Sweep Parameters** module. By default **Sweep Parameters** module in cross-validation mode partitions the training set into 10-folds. To change the number of folds we use **Partition and Sample** module with the following parameters: ![Reader][partition] We connect partitioned training set to 'training dataset' input of **Sweep Parameters**. Since we use **Sweep Parameters** in cross-validation mode, we leave module's right output unconnected. We use default values of **Sweep Parameters** that find the hyperparameters that optimize accuracy. The fourth and fifth steps in the above list are done by **Score Model** and **Evaluate Model** respectively. **Evaluate Model** computes many metrics. We extract Accuracy metric using **Project Column** module. Finally, we use **Add Rows** and **Execute R Script** to combine results of all learners into a single column and to add a column with the names of the algorithms. R code that is used in **Execute R** is given below: dataset <- maml.mapInputPort(1) Algorithm <- c("Averaged Perceptron","Bayes Point Machine", "Decision Jungle", "Locally-Deep SVM") data.set <- cbind(Algorithm, dataset) maml.mapOutputPort("data.set") *To compare performance of classifiers according to other metric (e.g. AUC) we will need to change 'Metric for measuring performance for classification' parameter to that metric. Also we will need to change **Project Columns** to project to that metric and update R code in **Execute R** with the name of the new metric.* ##Results The final output of the experiment is the left output of the last **Execute R Script** module: ![Reader][results] We conclude that among these 4 algorithms, Decision Jungle has the best accuracy in Adult Census Income dataset. <!-- Images --> [partition]:http://az712634.vo.msecnd.net/samplesimg/v1/17/partition.PNG [experiment]:http://az712634.vo.msecnd.net/samplesimg/v1/17/experiment.PNG [results]:http://az712634.vo.msecnd.net/samplesimg/v1/17/results.PNG