Microsoft R Deep Dive for External - Munich, Germany
The Microsoft Data Science Team invites you to an in-depth 3-day workshop on using Microsoft R.
Location: München, Germany. December 19 - December 21, 2016.
# About the Course
The course is designed to help analysts integrate Microsoft R Server (MRS) into their data science toolbox, and integrate it with other tools in Azure and the Cortana Intelligence Suite. After completion, participants will be able to:
* Explore and visualize data with R
* Manipulate data that is too large to fit into memory with MRS
* Train and test statistical models with high performance parallel external memory algorithms
* Access data stored in Azure Blob Storage using Microsoft R Server (MRS)
* Deploy Models as AzureML web services
In these sessions, you’ll gain hands-on experience with conducting scalable data analysis with Microsoft R Server. You will learn the fundamentals of R, and understand how Microsoft R Server addresses the major scalability and operationalization challenges associated with open source R.
# Prerequisites
There are a few things you will need in order to properly follow the course materials:
* There are a few things you will need in order to take full advantage of the course:
* An Azure subscription
* A terminal emulator with openSSH or bash, e.g., Putty, or Cygwin/MobaXterm
* I use MobaXterm and the Ubuntu Bash Subsystem within Windows
* Some R IDE. Some reasonable choices:
* RStudio
* Visual Studio 2015 with RTVS (Community Edition is sufficient)
* Jupyter/JupyterLab with IRKernel
* Microsoft R Server 8.0.3 or later
* Installation instructions
* I will assume you have already taken the following courses, or have the background provided by these courses:
* Implementing Real-Time Analytics with Hadoop in Azure HDInsight (or general knowledge of the Apache Hadoop ecosystem and HDInsight)
* Data Science and Machine Learning Essentials (or general understanding of machine learning and predictive modeling)
* Introduction to R for Data Science
* Intermediate knowledge of R would be ideal, at the level of either of the following two courses:
* Programming with R for Data Science
* R for SAS Users Course
* I will not assume any background knowledge about Microsoft R Server, but for those that are eager, you can find an online video series about MRS here:
* Course Website
* Video Lectures on Channel 9
* Lab Exercises
* A useful overview and comparison of MRS and MRO is available here.
# Modules
The course is divided into the following modules:
1. Each Training Module guides you through a logical progression with hands-on tasks in do-verb form. Each day is broken up into 1-4 hour Modules, where you will learn and perform labs on your own. Some material that is out of scope for hands-on labs will instead be demonstrated by instructor led labs. Participants will receive a copy of the lab material to try on their own, but are not required to run the analysis during the training time. The modules, broken up into a general agenda are as follows. The specific modules may bleed across sessions depending on engagement of the audience
2. Part I - Functional-Object Based Computing with R
3. Day One - Morning Session
4. Overview of the R Project and CRAN
5. Exploring the Microsoft R Data Stack
6. Functional Programming for Data Manipulation with the dplyr package
7. Day One - Afternoon Session
8. Understanding dplyr's symantics and the magrittr pipe
9. Data Visualization and Exploratory Data Analysis
10. Using the broom package for Modeling and Summarization
11. Part II - Breaking the Memory Barrier with RevoScaleR
12. Day Two - Morning Sesion
13. Overview of the Microsoft R Data Ecosystem
14. Modeling and Scoring with High-Performance ScaleR Algorithms
15. Data Manipulation with the dplyrXdf Package
16. Day Two - Afternoon Session
17. Summarizing Data with RevoScaleR
18. Performance Considerations with RevoScaleR
19. Parallel Computing and Disributed Computing with Microsoft R Server
20. Deploying R and ScaleR algorithms to Azure with the AzureML package
21. Part III - Microsoft R Server with Spark
22. Day 3 – Morning Session
23. Overview of the Apache Spark Project
24. Ingesting Data into Azure Blob Storage
25. Creating Spark DataFrames and Spark Contexts
26. Manipulating HDFS data with the sparklyr package
27. Day 3 – Afternoon Session
28. Creating Distributed eXternal DataFrames in HDFS
29. Preparing Data for Modeling with Microsoft R Server
30. Training Statistical Models with Microsoft R Server and the Spark Compute Context
31. Scoring and Deploying Models
32. Performance Considerations on Hadoop
# Concepts Covered
* Functional-Object Based Computing with R
* Breaking the Memory Barrier with RevoScaleR
* Microsoft R Server with Spark
# Technologies Covered
* Microsoft R Server