This example demonstrates how to use Azure Machine Learning (AML) Workbench to coordinate distributed training and operationalization of image classification models. We use the Microsoft Machine Learning for Apache Spark (MMLSpark) package to featurize images using pretrained CNTK models and train classifiers using the derived features. We then apply the trained models in parallel fashion to large image sets in the cloud. These steps are performed on an Azure HDInsight Spark cluster, allowing us to scale the speed of training and operationalization by adding or removing worker nodes. The form of transfer learning we demonstrate has major advantages over retraining or fine-tuning a deep neural network: it does not require GPU compute, is fast and arbitrarily scalable, and fits fewer parameters than a dense neural network layer. This method is therefore an excellent choice when few training samples are available -- as is often the case for custom projects. Many users report that transfer learning produces highly performant models, allowing them to avoid deep neural networks trained from scratch at much greater cost.
The **detailed documentation** for this real world scenario includes a [step-by-step walkthrough on Microsoft Docs][1].
For code samples, click the "**View Project**" icon on the right to visit the [project GitHub repository][2].
Key components needed to run this scenario:
- An [Azure account](https://azure.microsoft.com/en-us/free/) (free
trials are available)
- This sample creates an HDInsight Spark cluster with 40 worker nodes (168 cores total). Ensure that your account has enough available cores by reviewing the "Usage + quotas" tab for your subscription in Azure Portal.
- If you have fewer cores available, you may modify the HDInsight cluster template to decrease the number of workers provisioned. Instructions for this appear under the "Create the HDInsight Spark cluster" section.
- [Azure Machine Learning Workbench][3]
- Follow the [quick start installation guide][4] to install the program and create a workspace.
- [AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy),
a free utility for coordinating file transfer between Azure storage
accounts.
- Ensure that the folder containing the AzCopy executable is on your system's PATH environment variable. (Instructions on modifying environment variables are available [here](https://support.microsoft.com/en-us/help/310519/how-to-manage-environment-variables-in-windows-xp).)
This example was tested on a Windows 10 PC; you should be able to run it from any Windows machine, including Azure Data Science Virtual Machines. Minor modifications may be required (for example, changes to filepaths) when running this example on macOS.
[1]: https://docs.microsoft.com/azure/machine-learning/preview/scenario-aerial-image-classification
[2]: https://github.com/Azure/MachineLearningSamples-AerialImageClassification
[3]: https://review.docs.microsoft.com/en-us/azure/machine-learning/preview/overview-what-is-azure-ml
[4]: https://review.docs.microsoft.com/en-us/azure/machine-learning/preview/quick-start-installation