predicting if a customer is satisfied or dissatisfied
This experiment serves as a tutorial on building a classification model using Azure ML. We will be using the customer satisfaction data set
**Summary**
This experiment serves as a tutorial on building a classification model using Azure ML. We will be using the customer satisfaction data set and build a model for predicting if a customer is satisfied or dissatisfied with their banking experience
**Data**
This version of the customer satisfaction dataset can be retrieved from the Kaggle website, specifically their “train” data (3.34MB). The train customer satisfaction data ships with 76020 rows, each one represent products data related to the customers, such as credit products, cash, saving, total customer assets etc. the challenge with the dataset is that the columns are anonymous and represent mostly numbers without knowing what is each value represent, the dataset contain 371 column , the target column labeled as “TARGET” , It equals one for unsatisfied customers and 0 for satisfied customers.
Data Prep
• Using edit metadata to all column in the dataset, where datatype is integer and categorical is unchanged
• Using Clip Values with ClipPeaksAndSubpeaks for var3 column only
• Using Filter Based Feature Selection to identifies the features in a dataset with the greatest predictive power
• Using Normalize Data to Rescales numeric data to constrain dataset values to a standard range
Algorithm Selection
We chose to go with a Two-Class Logistic Regression and a Two-Class Neural Network. We used separate train model modules and score model modules for both algorithms to train separately on the same dataset. The algorithms were trained with their default settings. Both model's performance was evaluated and compared together using a single evaluate model module.
Results
Each models performed differently, Two-Class Logistic Regression (~0.75 RoC AuC) and the Two-Class Neural Network (~0.677 RoC AuC) . Two-Class Logistic Regression got an overall slightly higher RoC AuC, and we decide to continue with logistic regression and that is where this experiment will conclude. Users can take this experiment and tweak the parameters of either algorithms to achieve higher performance.