Binary Classification: Direct marketing
This experiment predicts if a customer will visit after the marketing campaign using boosted decision tree and support vector machine
# Binary Classification : Direct Marketing
This experiment demonstrates how to use **binary classifiers** to predict customer response to a direct mailing campaign based on historical data.
## Data
The dataset contains 64,000 records, each having nine features and three response (or label) columns.
The three responses that can be predicted are as follows:
- **visit** - Denotes that a customer visited the store after the marketing campaign.
- **conversion** - Indicates that a customer purchased something.
- **spend** - The amount that was spent.
For this particular experiment, we are predicting whether a customer will visit a store after the marketing campaign, which is labeled by a value in the ***visit*** column. This is an example of an unbalanced classification problem, because the percentage of visits is only about 2% of the data.
## Model
First, we did some simple data processing.
- The *segment* variable is converted to be categorical, using **Metadata Editor**.
- The *history_segment* column is removed because it is a simplification of *history* column.
- The label columns *conversion* and *spend* are eliminated because we are not using them in this experiment.
- The resulting data is split into training and testing datasets as following figure.
![][image1]
From the training data, we built two models, one using **Two-Class Boosted Decision Trees**, and the other using **Two-Class Support Vector Machine**. To determine the optimal parameters for each classifier, we used the **Sweep Parameters** module. **Sweep Parameters** module uses the __random sweep__ technique to find the parameters of the classification algorithms, which can give best performance on the test set.
The following graphic shows the overall experiment workflow:
![][image2]
## Results
Our results found that the **Two-Class Boosted Decision Trees** model had better accuracy. The following graphic shows the lift chart for the model.
![][image3]
The values for accuracy, precision, and recall were similar for both classifiers, when comparing at a threshold of .15, which is about the maximum f-score for both learners. Therefore, in order to determine whether the slightly increased precision of the Two-Class Boosted Decision Tree model was significant, we used custom R script to calculate McNemar's statistic.
The sequence of modules at the end of the experiment shows how we computed the scores for the labels at a threshold of .15 prior to calculating McNemar's statistic. (Normally the **Score Model** module uses a threshold of .5.)
From this, we learned that McNemar's p-value is close to zero, indicating that the improvement is significant. Therefore the **Two-Class Boosted Decision Tree** model is the one that we would use for any scoring workflow.
![][image4]
<!-- Images -->
[image1]:http://az712634.vo.msecnd.net/samplesimg/v1/8/split.PNG
[image2]:http://az712634.vo.msecnd.net/samplesimg/v1/8/modelGraph.PNG
[image3]:http://az712634.vo.msecnd.net/samplesimg/v1/8/perf.PNG
[image4]:http://az712634.vo.msecnd.net/samplesimg/v1/8/analysis.PNG