Time Series Forecasting
Time Series Forecasting with Azure ML using R
#Time Series Forecasting in Azure ML using R
In this article, we'll use Microsoft Azure Machine Learning Studio to build an experiment for doing time series forecasting using several classical time series forecasting algorithms available in R.
##Overview of Experiment
The main steps of the experiment are:
- [Step 1: Get data]
- [Step 2: Split the data into train and test]
- [Step 3: Run time series forecasting using R]
- [Step 4: Generate accuracy metrics]
- [Step 5: Results]
[Step 1: Get data]:#step-1-get-data
[Step 2: Split the data into train and test]:#step-2
[Step 3: Run time series forecasting using R]:#step-3
[Step 4: Generate accuracy metrics]:#step-4
[Step 5: Results]:#step-5
### Step 1: Get data
We obtained the N1725 time series data from the publicly available [M3 competition dataset](http://forecasters.org/resources/time-series-data/), and uploaded the data to Azure ML Studio. This dataset has 126 rows and two columns, **`time`** and **`value`**.
### Step 2: Split the data into train and test
We used the **Split** module in Azure ML Studio to divide the data into training and testing sets, using the _Relational split_ option and specifying a time value as the split condition. We used the first 108 points for training and the remaining 18 points for testing the accuracy of various forecasting modules.
![][image1]
### Step 3: Run time series forecasting using R
To compute forecasts, we used the following classical time series methods from the [forecast package](http://cran.r-project.org/web/packages/forecast/index.html) in R:
1. Seasonal [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)
2. Non Seasonal [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)
3. Seasonal [ETS](http://en.wikipedia.org/wiki/Exponential_smoothing)
4. Non -Seasonal [ETS](http://en.wikipedia.org/wiki/Exponential_smoothing)
5. Average of Seasonal [ETS](http://en.wikipedia.org/wiki/Exponential_smoothing) and Seasonal [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)
For all seasonal methods, we used a seasonality value of 12.
The following R script was added to the **Execute R Script** module to build the model for seasonal [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average)..
1. Read the training data in dataset1 and the test data (for the timestamps) in **`dataset2`**.
2. Create a **`ts`** object in R with the training data and specified seasonality.
3. Learn a [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) model using the **`auto.arima()`** function from the **`forecast`** package in R.
4. Compute the forecasting horizon by comparing the maximum timestamps in training and test datasets.
5. Forecast using the learned [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) model for the computed horizon.
![][image2]
For each of the other model types, we added a new **Execute R Script** module, added similar code to call the R packages with appropriate parameters.
Note: To save space, not all the scripts are shown here, but you can open the experiment in Azure ML Studio and click each module to see the R script details.
### Step 4: Generate accuracy metrics
We joined the forecasting results from each of the methods with the test data, to compute the accuracy metrics. We used another instance of the **Execute R Script** module to compute the following metrics:
- [**Mean Error** (ME) ](http://en.wikipedia.org/wiki/Mean_signed_difference)- Average forecasting error (an *error* is the difference between the predicted value and the actual value) on the test dataset
- [**Root Mean Squared Error** (RMSE)](http://en.wikipedia.org/wiki/Root-mean-square_deviation) - The square root of the average of squared errors of predictions made on the test dataset.
- [**Mean Absolute Error** (MAE)](http://en.wikipedia.org/wiki/Mean_absolute_error) - The average of absolute errors
- [**Mean Percentage Error** (MPE)](http://en.wikipedia.org/wiki/Mean_percentage_error) - The average of percentage errors
- [**Mean Absolute Percentage Error** (MAPE)](http://en.wikipedia.org/wiki/Mean_absolute_percentage_error) - The average of absolute percentage errors
- [**Mean Absolute Scaled Error** (MASE)](http://en.wikipedia.org/wiki/Mean_absolute_scaled_error)
- [**Symmetric Mean Absolute Percentage Error** (sMAPE)](http://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error)
### Step 5: Results
We found that the average of seasonal [ETS](http://en.wikipedia.org/wiki/Exponential_smoothing) and seasonal [ARIMA](http://en.wikipedia.org/wiki/Autoregressive_integrated_moving_average) models performs better than either of the two algorithms individually measured in terms of MASE/sMAPE/MAPE.
![][image4]
The final experiment looks like this:
![][image3]
<!-- Images -->
[image1]:http://az712634.vo.msecnd.net/samplesimg/v1/12/split.png
[image2]:http://az712634.vo.msecnd.net/samplesimg/v1/12/seasonal_arima.png
[image3]:http://az712634.vo.msecnd.net/samplesimg/v1/12/full_experiment.png
[image4]:http://az712634.vo.msecnd.net/samplesimg/v1/12/table.png