Binary Classification: Direct marketing

By for September 2, 2014
This experiment predicts if a customer will visit after the marketing campaign using boosted decision tree and support vector machine
# Binary Classification : Direct Marketing This experiment demonstrates how to use **binary classifiers** to predict customer response to a direct mailing campaign based on historical data. ## Data The dataset contains 64,000 records, each having nine features and three response (or label) columns. The three responses that can be predicted are as follows: - **visit** - Denotes that a customer visited the store after the marketing campaign. - **conversion** - Indicates that a customer purchased something. - **spend** - The amount that was spent. For this particular experiment, we are predicting whether a customer will visit a store after the marketing campaign, which is labeled by a value in the ***visit*** column. This is an example of an unbalanced classification problem, because the percentage of visits is only about 2% of the data. ## Model First, we did some simple data processing. - The *segment* variable is converted to be categorical, using **Metadata Editor**. - The *history_segment* column is removed because it is a simplification of *history* column. - The label columns *conversion* and *spend* are eliminated because we are not using them in this experiment. - The resulting data is split into training and testing datasets as following figure. ![][image1] From the training data, we built two models, one using **Two-Class Boosted Decision Trees**, and the other using **Two-Class Support Vector Machine**. To determine the optimal parameters for each classifier, we used the **Sweep Parameters** module. **Sweep Parameters** module uses the __random sweep__ technique to find the parameters of the classification algorithms, which can give best performance on the test set. The following graphic shows the overall experiment workflow: ![][image2] ## Results Our results found that the **Two-Class Boosted Decision Trees** model had better accuracy. The following graphic shows the lift chart for the model. ![][image3] The values for accuracy, precision, and recall were similar for both classifiers, when comparing at a threshold of .15, which is about the maximum f-score for both learners. Therefore, in order to determine whether the slightly increased precision of the Two-Class Boosted Decision Tree model was significant, we used custom R script to calculate McNemar's statistic. The sequence of modules at the end of the experiment shows how we computed the scores for the labels at a threshold of .15 prior to calculating McNemar's statistic. (Normally the **Score Model** module uses a threshold of .5.) From this, we learned that McNemar's p-value is close to zero, indicating that the improvement is significant. Therefore the **Two-Class Boosted Decision Tree** model is the one that we would use for any scoring workflow. ![][image4] <!-- Images --> [image1]:http://az712634.vo.msecnd.net/samplesimg/v1/8/split.PNG [image2]:http://az712634.vo.msecnd.net/samplesimg/v1/8/modelGraph.PNG [image3]:http://az712634.vo.msecnd.net/samplesimg/v1/8/perf.PNG [image4]:http://az712634.vo.msecnd.net/samplesimg/v1/8/analysis.PNG