Setting up predictive analytics pipelines using Azure SQL Data Warehouse
By Robert Alexander for Microsoft April 4, 2016
To demonstrate the power of Azure SQL Data Warehouse we will examine a sample use case that integrates SQL Data Warehouse with Azure Machine Learning. Along the way you will also be exposed to a number of other Azure components, namely Event Hub, Stream Analytics, and Power BI - as well as an on-prem SQL Server via a Data Management Gateway. At the end of this tutorial we will include the steps to deploy this use case in your Azure subscription.
This tutorial will cover several useful design patterns that you can use. It consists of a real time and a predictive pipeline. For the real time pipeline, you will see how Stream Analytics can read from an EventHub and send the data to PowerBI for visualization. For the predictive pipeline, you will see how Stream Analytics can also send the data to Azure SQL Data Warehouse, where an Azure Data Factory will call Azure Machine Learning to read the data from the warehouse and send the aggregated results back to the warehouse for visualization in PowerBI. In addition, you will see how historical batch data can be ingested from an on-prem SQL Server via Data Management Gateway to Azure SQL Data Warehouse. The use case is a rating system that allows users to rate an event (such as a conference talk) and visualize the results in real time (5 second intervals). Ratings are also stored in a data warehouse and sent to machine learning for near real time predictions (15 minute intervals). Lastly, historical ratings are bulk-loaded from an on-prem database. This tutorial provides deployable components and will walk you through the automated and manual steps necessary to install everything. Clicking **Get Started** on the right will take you to a GitHub repository containing the following: - **README.md** - The steps to deploy everything - **azuredeploypart1.json** - The first ARM template - installs Service Bus, Event Hub, Stream Analytics Job, SQL Server, and SQL Data Warehouse - **azuredeploypart2.json** - The second ARM template - installs the Data Factories - **datagenerator.zip** - Deployed as a WebApp to simulate user ratings - **Ratings.csv** - A day's worth of historical user ratings - **AverageRatings.csv** - A day's worth of historical user average ratings - **media** - A folder containing images used by README.md When everything is successfully deployed and running, the final result will be a PowerBI dashboard showing the ratings of each individual device in real time and the average rating for all devices. Click **Get Started** on the right to begin.