Submit Queries to Azure HDInsight Hive Cluster in Jupyter Notebooks

May 23, 2016

Report Abuse
This Jupyter Notebook shows how to submit queries to Azure HDInsight Hive clusters in Python.
Python is one of the most popular languages for data scientists, and Hive is a popular big data solution built on HDFS that is widely accepted by data scientists to tackle big data challenges. For data scientists who store big data in Hive clusters to build advanced analytical solutions in Python, it will be very convenient if they can run Hive queries, retrieve the query results directly to Python, and conduct further analysis and modeling without leaving the Python environment. This sample Jupyter Notebook shows you how to submit queries to Azure HDInsight Hive clusters in Python, and ingest the query results as a Pandas data frame. This Jupyter Notebook can run in Azure Machine Learning notebook services, Jupyter Notebook servers running on Windows or Linux (Ubuntu), or other environments with Python 2. Please notice that submitting Hive queries in Python and getting results will be slower than running the Hive queries directly in the head node of the Hive cluster because of the overhead.