Retail Customer Churn Template using Microsoft R Server/HDInsight/Spark
This template demonstrates how to develop and deploy end-to-end, cloud solutions for Retail Customer Churn using Microsoft R Server, Azure HDInsight with R on Linux, Azure Machine Learning, Spark, Scala, Hive and Power BI.
Customer churn is supremely important for retail, banking, telecommunications and many others customer related industries. Therefore, accurate churn predictions can help businesses proactively conduct better promotion plans, adjust engagement strategies, and make important business decisions. In the retail sector, churn predictions is critical and the loss of customers to competitors must be managed and prevented. The goal of churn predictions is to identify which customers are likely to churn.
In this template, we solve the churn prediction as a **Binary Classification** problem. The sample retail data used in this template comes from two sources:
* **Customer demographics** Information about the users.
* **Customer transactions** Information about the users activities with the business.
This template demonstrates how to use [Microsoft R Server][1], [Azure HDInsight with R on Linux][2], Spark, Scala, Hive and [Power BI][3] to build end-to-end, cloud solutions for Retail Customer Churn. The solution template includes data processing, feature engineering, model retraining, predicting, and visualization and a web application.
- **data processing**. Try to simulate the real world data placement by partition the raw data. Implemented using _Hive_.
- **feature engineering**. Implemented using _Hive_ and _Spark_/_Scala_
- **model training and scoring**. Built using _Microsoft R Server_ with _Spark_ compute context on _HDInsight_ clusters
- **web services**. Deployed through an R interface with _Azure Machine Learning_ (R package: AzureML)
- **visualizations**. Developed using Power BI Desktop connecting to _HDInsight Spark_ or _Azure Blob Storage_ and Power BI Online.
We also build a web application _Joseph Mart_ to consume the prediction results through the _Azure Machine Learning_ web services and display customized messages to various customers. A **Shell** script is provided to run the steps end-to-end.
The code and documentation can be found [here][4]. The following is the directory structure for this template:
* **PowerBIDesktop**. This contains two pre-built Power BI desktop files. One is for Power BI Desktop with Blob Storage Connection and one is for Power BI Desktop with Azure HDInsight Spark Connection.
* **Website**. This contains the source code for building the Joseph Mart web application.
* **data**. This contains the provided sample data.
* **hive**. This contains the Hive scripts for data processing.
* **mrs**. This contains the Microsoft R Server scripts for model training, scoring and building Azure Machine Learning web services.
* **notebook**. This contains a Jupyter Notebook, which walks through the feature engineering and tagging processes.
* **sparkapp**. This contains two pre-built Spark applications.
See Readme file in GitHub for detailed instructions.
[1]: https://www.microsoft.com/en-us/cloud-platform/r-server
[2]: https://azure.microsoft.com/en-us/services/hdinsight/
[3]: https://powerbi.microsoft.com/en-us/
[4]: https://github.com/Azure/Customer-Churn-Demo-MRS-Spark-HDI