AzureML Hackathon Data Experiment

February 25, 2015
The experiment contains the training and testing dataset for Azure ML hackathon. The problem statement of the experiment and related information is given below.
Problem Statement: This problem is related to the direct marketing campaigns of a financial company offering car loans. The marketing campaigns were based on phone calls. Often, more than one contact to the same customer was required, in order to check if the loan would be accepted. The goal of the task is to build a model to predict if a given customer will accept a car loan or not (variable Label). This can be effectively used to do more targeted marketing campaigns. Dataset Information: Number of Train Instances: 36169 Number of Test Instances: 9042 Number of Attributes: 16 + output attribute + ID label. Attribute information: 0 - Unique ID to identify row 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital: marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has he defaulted any credit payment? (binary: "yes","no") 6 - balance: average yearly balance, in thousand INR (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) 13 - campaign: number of contacts performed during this campaign and for this customer (numeric, includes last contact) 14 - pdays: number of days that passed by after the customer was last contacted from a previous campaign (numeric, -1 means customer was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this customer (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") 17 - Label - has the customer subscribed to car loan? (binary: "yes","no") Some of the attribute values could be missing (will be replaced with "?" symbol) for some of the records. It means those attributes were not known during the data collection process or were missing due to other unknown reasons. Evaluation Criterion: The submissions will be evaluated based on a standard classification measure F1 score (F1) which is defined as follows: A positive label means "Label" was "yes" i.e. customer has subscribed to car loan TP=No. of True Positives FN=No. of False Negatives FP=No. of False Positives TN=No. of True Negatives P (precision) = TP / (TP + FP) R (recall) = TP / (TP + FN) F1 = 2*P*R/ (P + R) Submission output Format: Mail to:; File Name: registrationId_sub1.txt, registrationId_sub2.txt, registrationId_sub3.txt. File Format: Id Label 1 yes 2 yes 3 no …