Scalable Data Science with Microsoft R Server and Spark (3-Day) - Turkey
In this course, you’ll gain hands-on experience with Microsoft R and HDInsight Spark for scalable data science and machine learning. You will learn about the premium offering of the HDInsight platform and how to leverage Microsoft R Server as an application on top of HDInsight Spark to perform data analysis and machine learning at scale. This is a two-day course and will teach you about Microsoft R and Spark from the ground up. Location: Turkey. March 13 - 15, 2017.
# About the Course
In this course, you’ll gain hands-on experience with Microsoft R and HDInsight Spark for scalable data science and machine learning. You will learn about the premium offering of the HDInsight platform and how to leverage Microsoft R Server as an application on top of HDInsight Spark to perform data analysis and machine learning at scale.
This is a two day course, and will teach you about Microsoft R and Spark from the ground up.
# Skills Taught
At the end of the course you will have acquired the following skills:
1. Understand what is Spark and why it's a more effective solution for iterative machine learning jobs than Hadoop MapReduce.
2. Understand functional programming and lazy evaluation.
3. Provision and deploy HDInsight Spark Clusters and install R Server as an application.
4. Understand the basics of adminstration and management of packages and applications on premium HDInsight Spark clusters.
5. Develop functions that are robust to different data structures and execution environments.
6. Use Spark and it's R APIs for exploratory data analysis.
7. Train and tune statistical machine learning models with Microsoft R Server's RxSpark compute context.
8. Deploy trained R models as an Azure ML web service.
# Prerequisites
There are a few things you will need in order to properly follow the course materials:
1. A subscription to Microsoft Azure (this may be provided through your company or as part of your invitation – you *must* have this enabled prior to class. You will be using Azure throughout the course, for all labs, work and exercises. You can use your MSDN subscription ( https://azure.microsoft.com/en-us/pricing/member-offers/msdn-benefits/), your employer may provide Azure resources to you, or you may receive instructions in your class invitation, and have at least $50 to spend for the course.
2. Understanding of R - ability to write functions, ability to train models, etc.
3. Putty, cygwin, or some bash emulator (some linux experience to go with it would be useful)
4. It’s also a good idea to have a general level of predictive and classification modeling, and a basic understanding of Statistics and Machine Learning, i.e., cross-validation, ensemble models, model metrics, etc.
# Technologies Covered
* R Language
* Microsoft R
* HDInsight
* Apache Spark
* R APIs for Spark - choices and limitations
* Azure ML
# Materials
* Ask me for any updates
* My major repo: https://github.com/akzaidi/spark-mrs-demo
* Last delivery: https://github.com/akzaidi/R-cadence/tree/master/Spark