Enhanced model evaluation with multiple performance metrics for regression analysis

May 10, 2018

Report Abuse
Enhanced Evaluate Model module which integrates 22 performance metrics and can be used with both Azure built-in models and R script models.
**Experiment Highlights** The purpose of this experiment was to develop an Enhanced Evaluate Model module, based on the R language that would improve performance of the built-in Azure MLS Evaluate Model module. Specifically: - Enhanced Evaluate Model module enables evaluation of regression models with 22 performance metrics (compared to five (5) in the built-in Azure module). - Enhanced Evaluate Model module can perform evaluations of models implemented with R language using Azure “Create R Model” (function not available now in Azure) as well as combining them with evaluations of the Azure built-in regression models. Details of this experiment are described in: Botchkarev, A. (2018). Evaluating Performance of Regression Machine Learning Models Using Multiple Error Metrics in Azure Machine Learning Studio (May 12, 2018). Available at SSRN: http://ssrn.com/abstract=3177507 A list of the performance metrics implemented in the Enhanced Evaluate Model module in alphabetic order of the metric abbreviation: **Metric Abbreviation** >>> **Metric Name** CoD >>> Coefficient of Determination GMRAE >>> Geometric Mean Relative Absolute Error MAE >>> Mean Absolute Error MAPE >>> Mean Absolute Percentage Error MASE >>> Mean Absolute Scaled Error MdAE >>> Median Absolute Error MdAPE >>> Median Absolute Percentage Error MdRAE >>> Median Relative Absolute Error ME >>> Mean Error MPE >>> Mean Percentage Error MRAE >>> Mean Relative Absolute Error MSE >>> Mean Squared Error NRMSE_mm >>> Normalized Root Mean Squared Error (normalized to the difference between maximum and minimum actual data) NRMSE_sd >>> Normalized Root Mean Squared Error (normalized to the standard deviation of the actual data) RAE >>> Relative Absolute Error RMdSPE >>> Root Median Square Percentage Error RMSE >>> Root Mean Squared Error RMSPE >>> Root Mean Square Percentage Error RSE >>> Relative Squared Error sMAPE >>> Symmetric Mean Absolute Percentage Error SMdAPE >>> Symmetric Median Absolute Percentage Error SSE >>> Sum of Squared Error **Overview of the experiment** The focus of this experiment is on the 'Enhanced Evaluate Model' module. However, for easier understanding of the module operation and potential reuse, it is shown in a typical Azure regression experiment workflow, i.e. input data, initialize model, train model, score model, evaluate model. A component of the previously published experiment is used as a sample: Botchkarev, A. (2018). Revision 2 Integrated tool for rapid assessment of multi-type regression machine learning models. Experiment in Microsoft Azure Machine Learning Studio. Azure AI Gallery. https://gallery.azure.ai/Experiment/Revision-2-Integrated-tool-for-rapid-assessment-of-multi-type-regression-machine-learning-models Details of this experiment are described in: Botchkarev, A. (2018). Evaluating Hospital Case Cost Prediction Models Using Azure Machine Learning Studio. arXiv:1804.01825 [cs.LG]. **How to use the model in your experiment** 1. Use input from the 'Score Model' (dataset1) to create vectors (one-column data frames) for actual and predicted values. Check column names by visualizing output of the 'Score Model'. Column names for predicted variable may differ depending on which regression model is used, e.g. 'Scored Label Mean' for Decision Forest Regression, or 'Scored Labels' for Linear regression. Rename the column in the input data frame with target/label variable values ('Cost' in the example) to have a new name - 'Actual'. Similarly, rename the column in the input data frame with predicted variable values ('Predicted Cost' in the example) to have a new name - 'Predicted'. See hints in the R script of the 'Enhanced Evaluate Model'. 2. Copy 'Enhanced Evaluate Model', paste it into your experiment canvas, and connect input of this module to the output of the 'Score Model'. **Output** Right click on the left output port (1) of the Enhanced Evaluate Model module. Select Visualize option from a drop down menu. The output is presented in a table with three columns: metric abbreviation, metric full name and numerical estimate. Another option is to attach a Convert to CSV module to the Enhanced Evaluate Model module and download output file. Note that results are rounded to four digits after the decimal point. **Concluding Remarks** Note that the operation of the module assumes that all data preparation has been done: there are no NA (missing) data elements, all data are numeric. Note that some metrics perform division operation and may return error messages, if denominator is equal or close to zero. Note that the module is publicly shared for information and training purposes only. All efforts were taken to make this module error-free. However, we do not guarantee the correctness, reliability and completeness of the material and the module is provided "as is", without warranty of any kind, express or implied. Any user is acting entirely at their own risk. Note that the views, opinions and conclusions expressed in this document are those of the author alone and do not necessarily represent the views of the author’s current or former employers.