KDDCup2015 Sample Experiment: Customer Churn Prediction (High)

June 23, 2015
This is the complete version of the sample experiment for KDD Cup 2015. It takes 2 hours to complete, and gets AUC=0.873 on public leaderboard.
Each year KDD Cup brings the data science community together to fight for a coveted spot on their leader-board via a data competition. This year's challenge, KDD Cup 2015, requires participants to predict the likelihood of a student dropping out from a MOOC platform, XuetangX. This is a typical customer churn analysis problem. Gaining a better understanding of customer churn is a top priority for not just MOOC platforms but almost all businesses. We publish this experiment to act as a great starting point for those who are looking to participate in the KDD Cup competition. You can use it as a baseline and start building upon them in the rich Azure ML studio environment by dragging and dropping an extensive set of available algorithms or your own custom R and Python scripts. This experiment is a more complex version of the experiment KDD Cup 2015: Customer Churn Prediction (Low). These two experiments differ in complexity, performance, and running speed. Here are the exact differences between these two experiments: I. The Low-version experiment is lower on complexity than the High-version experiment. It creates a smaller set of features, and trains a smaller gradient boosted decision tree model (maximum number of leaves per tree=50 and number of trees=600 in Low-version vs maximum number of leaves per tree=100 and number of trees=1200 in High-version), than the High-version experiment. II. The Low-version experiment has lower performance (AUC=0.853 on public leader-board) than High-version experiment (AUC=0.873 on public leader-board). III. The Low-version experiment, because it is simpler, runs faster than the High-version experiment. The Low-version experiment can be completed in 18 minutes whereas the High-version experiment takes 2 hours. After you copy the experiment to your Azure ML workspace, you can click the “Run” button at the page bottom. After the experiment completes, right click the output portal of the “Convert to CSV” module at the right bottom corner of the experiment, select “Download”, the predictions on test data will be downloaded to your local machine. You need to delete the headerline of the downloaded csv file before you submit it to the competition website for evaluation. It should give you AUC=0.873 on the public leader-board. The detailed step-by-step tutorial of this high-version experiment can be found in https://github.com/Azure/Azure-MachineLearning-DataScience/blob/master/Misc/KDDCup2015/kddcup2015-sample-experiment-step-by-step-tutorial.pdf