Sales Lead Ranking

February 23, 2018
Rank sales leads in a CRM system by predicting the likelihood that they turn into an actual sale.
The use case described in this experiment is a prediction of likelihood that prospect customers buy a specific product, under certain conditions identified in the source dataset. The prediction is based on customer-oriented features, such as location, age, gender, and product features like program type, duration, destination. Products considered in this experiment relate to an educational institution selling on-site language courses. The **dataset is labeled**, which allows for **supervised learning** algorithms, and the label column "Count" being trained contains numerical values, which are the total number of sales collected in the past for the indicated conditions. The following classification - i.e. prediction - algorithm is tested: * **Two-Class Boosted Decision Tree**: This algorithm is an ensemble learning method in which the second tree corrects for the errors of the first tree, the third tree corrects for the errors of the first and second trees, and so forth. Predictions are based on the entire ensemble of trees together that makes the prediction. Generally, boosted decision trees are the easiest methods with which to get top performance on a wide variety of classification tasks. However, they are also one of the more memory-intensive algorithms, and therefore, it should not be used to process a very large datasets. If you are dealing with datasets in the range of millions of row, prefer a linear algorithm, such as Two-Class Support Vector Machine. #Dataset Data is sourced from a Dynamics 365 for Sales application (a sample copy is provided in this experiment), and contains the following columns: - Country, City and District of students - Their Age and Gender - Date of request (expression of interest to attend a specific program) - The Program itself, its Start Date and Duration - The preferred Destination to attend the course - A Count of total sales ![Dataset sample][1] #Training This experiment selects a sub-set of features for training: - Country - Gender - Program - Destination - Count The **Count** column is used for training the model, that is for identify patterns of product sales in the provided historical data. Feel free to modify the selected columns for trying alternative prediction models based on different features. ![Training experiment][2] #Scoring The model is trained and scored, using **Two-Class Boosted Decision Tree**, as said. For classification models, the **Score Model** action generates the predicted numeric value, which is identified in the scored dataset as "Scored Probabilities". The picture below described the frequency of distribution of such scored probabilities and their density of prediction. ![Scored probabilities][3] #Evaluation The **Evaluate Model** action assesses the performance of the prediction against true positive rate and false positive rate, giving an indication of the quality of scored results. For more information on the generated metrics and how to interpret them, please refer to [Evaluate Model][4] #Web Service Once you run the **Predictive Experiment** and deploy it as a **Web Service**, it is possible to interact with the prediction service programmatically. ![Predictive experiment][5] Based on the selected columns in the Training experiment, testing the service asks for the following values in input: - Country - Gender - Program - Destination And it generates an estimated likelihood that the sales lead with the indicated customer and product featured will turn into an actual sale. Likelihood is expressed as a number between 0 and 1. To consume the Web Service programmatically, this is the expected format of the **request message**: { "Inputs": { "input1": { "ColumnNames": [ "Country", "Gender", "Program", "Destination" ], "Values": [ [ "value", "value", "value", "value" ], [ "value", "value", "value", "value" ] ] } }, "GlobalParameters": {} } And this is the format of the **response message**, containing the **Scored Probabilities** value. { "Results": { "output1": { "type": "DataTable", "value": { "ColumnNames": [ "Scored Probabilities" ], "ColumnTypes": [ "Numeric" ], "Values": [ [ "0" ], [ "0" ] ] } } } } [1]: [2]: [3]: [4]: [5]: