Compare Regressors

By for September 2, 2014
This sample demonstrates how to train and compare multiple regression models in Azure ML Studio, including Bayesian linear regression, neural network regression, boosted decision tree regression, linear regression, and decision forest regression.
#Compare Multiple Regressors This sample demonstrates how to train and compare multiple regression models. ##Data In this experiment, we use the Boston Housing data set from the UCI repository. We use a **Reader** module to read the data in the arff format at [tunedit.org](http://tunedit.org/download/UCI/numeric/housing.arff). This data set contains 506 samples with 13 features and 1 target. All features except CHAS are numerical features (CHAS is a dummy variable with value 0 and 1). ![][image0] ##Data Processing We use the **Metadata Editor** to change the name of the last column to "Target". We split the whole data set randomly using the **Split** module into a training and a test data set with equal size. ##Model In this experiment, we compare the following models in regression: - **Bayesian Linear Regression** - **Neural Network Regression** - **Boosted Decision Tree Regression** - **Linear Regression** - **Decision Forest Regression** ![][image1] For all models, we use the **Train Model** module to train the model; then we use the **Score Model** module to make the prediction on the test data set; based on the prediction on the test data set, we apply the **Evaluate Model** module to compute the regression performance for different models. The image above shows the workflow of all regression models. For **Bayesian Linear Regression** and **Decision Forest Regression**, the output of **Evaluate Model** consists of the following 6 metrics: 1. Negative Log Likelihood 2. Mean Absolute Error 3. Root Mean Squared Error 4. Relative Absolute Error 5. Relative Squared Error 6. Coefficient of Determination For **Neural Network Regression**, **Boosted Decision Tree Regression**, and **Linear Regression**, the negative log likelihood is not computed, so only 5 metrics are computed. ###Combining Multiple Regression Performance Metrics We combine and compare all regression performance metrics together by using the **Execute R Script** and **Add Rows** modules. The **Evaluate Model** module produces a table with a single row that contains various metrics. In the **Execute R Script** module, we extract the regression performance metrics and add the corresponding model name manually. The R code in **Execute R Script** for the **Bayesian Linear Regression** model is dataset <- maml.mapInputPort(1) # Add algorithm name into the data frame data.set <- data.frame(Algorithm='Bayesian Linear Regression') data.set <- cbind(data.set, dataset[2:6]) maml.mapOutputPort("data.set"); Note that we ignore the negative log likelihood and extract the remaining 5 metrics. For **Neural Network Regression**, **Boosted Decision Tree Regression**, and **Linear Regression** models, we add the model name and extract all metrics. The following is the R code in the **Execute R Script** for **Linear Regression**: dataset <- maml.mapInputPort(1) # Add algorithm name into the data frame data.set <- data.frame(Algorithm='Linear Regression') data.set <- cbind(data.set, dataset[1:5]) maml.mapOutputPort("data.set"); All **Execute R Script** modules in this experiment produce a table with a single row that contains model name and various metrics. Finally we use the several **Add Rows** modules to combine all regression performance metrics together. ##Results The final results of the experiment, obtained by right-clicking the Results data set output of the last **Add Rows** are: ![][image2] where the first column is the name of the machine learning algorithm used to generate a model, and the remaining 5 columns are computed regression performance metrics. In this experiment, the **Boosted Decision Tree Regression** module achieves the best Root Mean Squared Error (RMSE). <!-- Images --> [image0]:http://az712634.vo.msecnd.net/samplesimg/v1/19/whole_exp19.png [image1]:http://az712634.vo.msecnd.net/samplesimg/v1/19/regression_work_flow.png [image2]:http://az712634.vo.msecnd.net/samplesimg/v1/19/regression_results.png