Building a Decision Tree Classifier Model
Ever wonder how a model gets to its conclusions? A decision tree is often the most transparent algorithm in terms of internal mechanics.
#Data
This version of the Titanic dataset can be retrieved from the [Kaggle](https://www.kaggle.com/c/titanic-gettingStarted/data) website, specifically their “train” data (59.76 kb). The train Titanic data ships with 891 rows, each one pertaining to an occupant of the RMS Titanic on the night of its sinking.
[Demo: Interact with the user interface of a model deployed as service](http://demos.datasciencedojo.com/demo/titanic/)
The dataset also has 12 columns that each record an attribute about each occupant’s circumstances and demographic. For this particular experiment we will build a classification model that can predict whether or not someone would survive the Titanic disaster given the same circumstances and demographic.
#Model
First, some preprocessing. It is highly recommended that you read the [detailed tutorial](http://datasciencedojo.com/dojo/building-and-deploying-a-classification-model-in-azure-ml/) to understand the rationale behind each step:
* Drop the columns that do not add immediate value for data mining or hold too many missing categorical values to be a reliable predictor attribute. The following columns were dropped using the **select columns in dataset** module:
* PassengerID, Name, Ticket, Cabin
* Identify categorical attributes and cast them into categorical features using the **edit metadata** module. The following attributes were cast into categorical values:
* Survived, Pclass, Sex, Embarked
* Scrub the missing values from the following columns using the **clean missing data** module:
* All missing values associated with numeric columns were replaced with the median value of the entire column
* All missing values associated with categorical columns were replaced with the mode value of the entire column
* Randomly split and partition the data into 70% training and 30% scoring using the **split** module.
#Algorithm Selection
In this gallery experiment we show that how to build a single decision tree in Azure ML, much like that of the rpart package in R programming. We will take the **two-class decision forest** as the learning algorithm and set the number of trees to one. Then we train the model using the **train model** module. We use the **score model** module to get predictions from our model on the 30% test set from the **split data** module. Evaluation metrics are given in the **evaluate model** module.
# Related
1. [Detailed Tutorial: Building and deploying a classification model in Azure Machine Learning Studio](http://datasciencedojo.com/dojo/building-and-deploying-a-classification-model-in-azure-ml/)
2. [Demo: Interact with the user interface of a model deployed as service](http://demos.datasciencedojo.com/demo/titanic/)
3. [Tutorial: Creating a random forest regression model in R and using it for scoring](https://gallery.azureml.net/Details/b729c21014a34955b20fa94dc13390e5)
4. [Tutorial: Obtaining feature importance using variable importance plots](https://gallery.azureml.net/Details/964dfc4151e24511aa5f78159cab0485)