Model Parameter Optimization : Sweep parameters
This experiment demonstrates the usage of the 'Sweep Parameters' module to fine tune model's parameters during the training process.
#Model Parameter Optimization: Sweep Parameters#
In this experiment we will use the **Sweep Parameters** module to fine-tune a classification model and select the parameters that lead to the best classification accuracy. The experiment will compare two support vector machine (SVM) classifiers: one that is chosen by the **Sweep Parameters** module, and the other trained using the default parameters provided in **Two-Class Support Vector Model** and then trained by **Train Module**.
Training support vector machines, as is the case with many other classifiers, requires specifying some parameters, as each set of parameters produces a specific model instance. Parameter sweeping can be used to automatically train different models using different sets of parameters, for example by trying a number of random combinations of values, or by using grid search, where all values within a range of values are tried. The set of parameters that lead to the best performance of a model can then be selected and used for the final training and scoring of a dataset.
##Data##
The data used in this experiment is a subset of the 1994 Census database, representing working adults over the age of 16 with an adjusted income index of > 100. The dataset is publicly available on the [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets/Adult).
##Create the Experiment##
1. Drag and drop the **`Adult Census Income Binary Classification`** dataset from the Saved Datasets group.
2. Add a **Split** module to create a training and test sets, and set the value of **Fraction of rows in the first output dataset** to 0.7. This will create a training set that contains 70% of the data.
3. Add a **Two-Class Support Vector Machine** module. This will initialize an SVM classifier. Set the value of the **Create trainer mode** property to **Parameter Range** to allow different parameter values to be used in the training. You can either specify a list of values explicitly or use the range builder to specify a range. We will use the range builder for the **Number of iterations** parameter and try 4 values in the range 1-100. As for the **Lambda** parameter, we will use the default list of values as shown in the figure below.
![][image1]
4. Add another **Split** module to hold out a proportion of the training set and use it for validation. Set the split fraction property to 0.6 (for example).
5. Add a **Sweep Parameters** module. Set the sweeping mode to **Entire grid** to train using all combinations of parameter values specified in the SVM module. Launch the column selector and choose the **`income`** column for the **Label column** (The label is what we are trying to predict).
![][image2]
6. Connect the SVM module and the two outputs of the **Split** module to the **Sweep Parameters** module in order (from left to right).
7. Add a **Score Model** module and connect the trained best model from the **Sweep Parameters** module to the left input port. Connect the test dataset (the second output of the first **Split** module) to the right input port.
8. Add an **Evaluate Model** module to evaluate the performance of the classifier on the test set. Connect the **Score Model** output to one of the inputs.
9. To compare the results with those of another classifier that doesn't make use of parameter sweeping, add a **Two-Class Support Vector Machine** module with default settings, and a **Train Model** module to train it. You can use the same training data as input for **Train Model**. Make sure to choose the **`income`** column for the **Label column** property.
10. Next, connect another **Score Model** module, and use the test dataset to evaluate the accuracy of the untuned SVM model. Note that the **Evaluate Model** module you already added can take in two scored datasets. By connecting the scored results from the untuned model, you can easily compare the results of the two SVM models.
The following diagram shows the complete experiment:
![][image3]
## Results
The following diagram shows the ROC chart for both models: the model with optimal parameters obtained by using **Sweep Parameters** to train the model, and the model trained by using the default parameter. We can see there is significant improvement from using a parameter sweep.
![][image4]
[image1]:http://az712634.vo.msecnd.net/samplesimg/v1/27/svmParams.PNG
[image2]:http://az712634.vo.msecnd.net/samplesimg/v1/27/sweepParams.PNG
[image3]:http://az712634.vo.msecnd.net/samplesimg/v1/27/main.PNG
[image4]:http://az712634.vo.msecnd.net/samplesimg/v1/27/sweepResultsCurve.PNG