Binary Classification: Customer relationship prediction
This experiment shows how to do predictions related to Customer Relationship Management (CRM) using binary classifiers.
## Binary Classification: Customer relationship prediction
This experiment shows how to do predictions related to **Customer Relationship Management (CRM)** using binary classifiers.
##Data
The data used for this experiment is from KDD Cup 2009. The dataset has 50000 rows and 230 feature columns. The task is to predict **churn**, **appetency** and **up-selling** using these features. Please refer to the [**KDD Cup 2009 website**](http://www.sigkdd.org/kdd-cup-2009-customer-relationship-prediction) for further details about the data and the task.
##Model
The complete experiment graph is given below.
![][image1]
First we did some simple data processing.
* The raw dataset contains lots of missing values. We use the **Clean Missing Data** module to replace the missing values by 0.
![][image2] ![][image3]
<<<<<<< HEAD
* The customer features and the corresponding churn, appetency and up-selling labels are in different datasets. We used the **Add Columns** module to append the label columns to the feature columns. The first column *Col1* is the label column and the rest of the columns *Var1, Var2, ...* are the feature columns.
![][image4]
* We split the dataset into train and test sets using the **Split** module.
Then we used **Boosted Decision Tree** binary classifier with default parameters to build the prediction models. We built one model per task, i.e. to predict **up-selling**, **appetency** and **churn**.
=======
* The customer features and the corresponding **churn, appetency** and **up-selling** labels are in different datasets. We use the **Add Columns** module to append the label columns to the feature columns. The first column *Col1* is the label column and the rest of the columns *Var1, Var2, ...* are the feature columns.
![][image4]
* We split the dataset into train and test sets using the **Split** module.
Then we use **Two-Class Boosted Decision Tree** binary classifier with default parameters to build the prediction models. We build one model per task, i.e. to predict **up-selling**, **appetency** and **churn**.
>>>>>>> bfe13ba3a2a347ba668f66dfa7548a298c30a2e9
##Results
The performance of the model on the test set can be seen by visualizing the output of the **Evaluate Model** module. For the upselling task, the **ROC** curve shows that the model does better than a random model and the **area under the curve (AUC)** is 0.857. At threshold 0.5, the **precision** is 0.663, **recall** is 0.463 and **F1 score** is 0.545.
![][image5]
We can move the threshold slider and see how different metrics change for the binary classification task. In the following figure we see the metrics for threshold 0.7.
![][image6]
We can make similar observations for the other tasks.
<!-- Images -->
[image1]:http://az712634.vo.msecnd.net/samplesimg/v1/2/expt_graph.PNG
[image2]:http://az712634.vo.msecnd.net/samplesimg/v1/2/raw_features.PNG
[image3]:http://az712634.vo.msecnd.net/samplesimg/v1/2/scrubbed_features.PNG
[image4]:http://az712634.vo.msecnd.net/samplesimg/v1/2/label_and_features.PNG
[image5]:http://az712634.vo.msecnd.net/samplesimg/v1/2/upselling_evaluation.PNG
[image6]:http://az712634.vo.msecnd.net/samplesimg/v1/2/upselling_evaluation_0.7.PNG