Classify US Incomes - TDSP Project

By for September 18, 2017

Report Abuse
How to use the Team Data Science Process template to create a project in Azure Machine Learning that classifies US incomes.
The **detailed documentation** for this real world scenario is below: [https://docs.microsoft.com/azure/machine-learning/preview/scenario-tdsp-classifying-us-incomes](https://docs.microsoft.com/azure/machine-learning/preview/scenario-tdsp-classifying-us-incomes) For code samples, you can also click the **View Project** icon on the right and visit the project GitHub repository. ## Introduction Standardization of the structure and documentation of data science projects, that is anchored to an established [data science lifecycle](https://github.com/Azure/Microsoft-TDSP/blob/master/Docs/lifecycle-detail.md), is key to facilitating effective collaboration in data science teams. Creating Azure Machine Learning projects with the [Team Data Science Process (TDSP)](https://github.com/Azure/Microsoft-TDSP) template provides a framework for such standardization. Here we provide an example of how an actual machine learning project can be created using TDSP structure, populated with project-specific code, artifacts and documents, and executed within the Azure Machine Learning. ## Overview This sample shows how to instantiate and execute a machine learning project using the [Team Data Science Process (TDSP)](https://github.com/Azure/Microsoft-TDSP) structure and templates in Azure Machine Learning. For this purpose, we use the well-known [1994 US Census data from the UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/adult). The modeling task is to predict US annual income classes from US Census information (for example, age, race, education level, country of origin, etc.). Project summary is available in the [GitHub repository](https://github.com/Azure/MachineLearningSamples-TDSPUCIAdultIncome/blob/master/project-summary.md). ## Pre-requisites * An Azure [subscription](https://azure.microsoft.com). You can get a [free subscription](https://azure.microsoft.com/free/?v=17.16&WT.srch=1&WT.mc_id=AID559320_SEM_cZGgGOIg) to execute this sample also. * An [Azure Data Science Virtual Machine (DSVM) Windows Server 2016](https://azuremarketplace.microsoft.com/marketplace/apps/microsoft-ads.windows-data-science-vm), (VM Size: [DS3_V2](https://docs.microsoft.com/azure/virtual-machines/windows/sizes), with 4 virtual CPUs and 14-Gb RAM). Although tested on an Azure DSVM, it is likely to work on any Windows 10 machine.