Reading data from Azure DocumentDB in Azure Machine Learning

November 28, 2016
This experiment demonstrates how to read data from Azure DocumentDB into Azure Machine Learning using the Import Data module.
In this article, we will show how to build an experiment that reads data from DocumentDB into Azure Machine Learning using DocumentDB SQL. Azure DocumentDB is a fully managed NoSQL database service that lets you store data using a flexible data model. The advantages of DocumentDB for machine learning include fast and predictable performance, automatic scaling, global distribution, and rich query capabilities that let you filter datasets. This experiment demonstrates the use of Import Data Module in Azure Machine Learning to query documents in DocumentDB. This capability gives you the ability to query DocumentDB using the [DocumentDB SQL syntax][1]. In this experiment, we use the Import Data module in Azure Machine Learning and connect to a sample collection in DocumentDB to query sample information related to volcanos. The model selects a subset of the relevant features and trains a model to predict the "Status" field in the dataset. It uses the DocumentDB SQL query to query the data from DocumentDB with a specific filter on "Elevation". Configuration Parameters For Querying Data From DocumentDB: ----------------------------------------------------------- In order to use this capability, you need the following information which can be retrieved from your subscription using [Microsoft Azure Portal][2] and navigating to the DocumentDB instance you are planning to use. • Endpoint URL for your DocumentDB instance in Azure in fully-qualified-domain-name format • ID or name of the database where your collection resides • ID or name of the collection you want to query in your DocumentDB instance • Key for accessing the DocumentDB instance • SQL Query you want to execute against DocumentDB Example Configuration For Import Data Module: --------------------------------------------- In the sample experiment, you will see the following configuration as an example: • Endpoint URL = • Database ID = volcanodb • Collection ID = volcano1 • DocumentDB = “” – you will need to paste in the following read-only key for this instance: MSr6kt7Gn0YRQbjd6RbTnTt7VHc5ohaAFu7osF0HdyQmfR+YhwCH2D2jcczVIR1LNK3nMPNBD31losN7lQ/fkw== • SQL Query = Select * from volcano1 where volcano1.Elevation < 10000 You can replace these values with your own information for your DocumentDB. ![Sample Experiment][3] Technical Notes: ---------------- If you're new to DocumentDB, we recommend these topics to get started: [DocumentDB documentation][4] Please refer to [Import Data module documentation][5] for important technical information such as recommended query structure for querying DocumentDB from Azure Machine Learning. Published by a Microsoft employee [1]: [2]: [3]: [4]: [5]: