Adult Income 1
Adult census income prediction experiment 1
This is example ML model for Student in Microsoft Azure Machine Learning Course.
Predict whether income exceeds $50 K / yr based on census data using UCI "Census Income" dataset.
#In this session
• Sing Up FREE Azure ML Studio Subscription<br>
• Create Azure ML Studio workspace<br>
• Train, Test, Evaluate for Binary Classification<br>
• Import census income dataset<br>
• Create a new Azure Machine Learning experiment<br>
• Train and evaluate a prediction model<br>
• Type of datasets<br>
#First experiment model
#Sing Up FREE Azure ML Studio Subscription
https://studio.azureml.net/
Sing Up FREE Azure ML Studio Subscription
Free Workspace -> sign up here
#Create Azure ML Studio workspace
1. Go to the Azure portal https://portal.azure.com
2. Click +New
3. Select Internet of Things, click Machine Learning Workspace, then click Create
4. Workspace name = ws1
5. Subscription = defult
6. Resource group = Create new: rs1
7. Location = Southeast Asia
8. Storage account = Create new: names1
9. Workspace pricing tier = Standard
10. Web service plan = Create new: ws1Plan
11. Click No pricing tier selected
12. Click DEVTEST
13. Click Pin to dashboard
14. Click Create
15. Click at Machine Learning workgroup on dashboard
16. Click Launch Machine Learning Studio
#Blank, new ML Studio workspace
**Predicting whether a person’s income exceeds $50,000 per year based on his demographics or census data**
1. Download, prepare, and upload a census income dataset.
2. Create a new Azure Machine Learning experiment.
3. Train and evaluate a prediction model.
#The overall workflow of the experiment
• Create New blank experiment. Name = Adult Income 1
• Click Data Input and Output
• Drag & drop Import Data
#Configure Import data module:
• Data source = Web URL via HTTP<br>
• Data source URL = http://archive.ics.uci.edu/ml/ machine-learning-databases/adult/adult.data
• Data format = CSV
• Run experiment
• Right click at the output of Import Data
• Click Visualize
• Click on Col2
• Look at Statistics and Histogram
#Split up the dataset
• Training data This grouping is used for creating our new predictive model based on the inherent patterns found in the historical data via the ML algorithm we use for the solution.
• Validation data This grouping is used for testing the new predictive model against known outcomes to determine accuracy and probabilities.
#Add Split Data:
• Click Data Transformation<br>
• Click Sample and Split
• Drag & drop Split Data module into canvas
• Connect Import Data to Split Data
• Set properties Fraction of row to 0.80
• Add Two-Class Boosted Decision Tree and Train Model
• Connect Two-Class Boosted Decision Tree to Train Model
• Connect Split Data to Train Model
• Click Train Model
• Click Launch column selector
• Include col15
• Click
• Save
• Run
#Score the model:
• Add Score Model to canvas
• Connect Score Model to Train and Split model
• Run
#Visualize the model results
• Visualize output of Score Model
• Scored Labels This column denotes the model’s prediction for this row of the dataset.
• Scored Probabilities This column denotes the numerical probability (or the likelihood) of whether the income level for this row exceeds $50,000.
#Type of datasets
#Training set
• A set of examples used for learning
• Where the answer value is known.
#Validation set
• A set of examples data
• Used to tune the architecture of a classifier
• And estimate the error
#Test set
• Use to test the performances of a classifier
• Never used during the training process
• Give estimate of error
**#More Information**
#Two-Class Boosted Decision Tree
https://msdn.microsoft.com/en-us/library/azure/dn906025.aspx
#Score Model
https://msdn.microsoft.com/en-us/library/azure/dn905995.aspx
<br><br>
----------
> This ML experiment is for [Microsoft Azure Machine Learning Course][101].<br>
For the complete experiment list [Click here][102].<br>
Laploy | laploy@gmail.com | 084 007 5544 | [www.laploy.com][103]<br>
![enter image description here][104]
----------
[101]: https://notebooks.azure.com/laploy/libraries/loyml/html/00001%20Sessions%20summary.ipynb
[102]: https://gallery.cortanaintelligence.com/Home/Author?authorId=81E333F747E3429B55A3445E6714C36F60B397C13B4D0B07F34DEF1421F64D73
[103]: http://laploy.com
[104]: https://raw.githubusercontent.com/laploy/mli/master//loy-small.jpg
[11]: https://raw.githubusercontent.com/laploy/mli/master//loy-small.jpg