# Data Analysis and Interpolation Using R

The purpose of this experiment is to demonstrate how interpolation can be implemented in Azure MLS using R.

Introduction
-------------
Interpolation is a method of constructing new data points within the range of a discrete set of known data points obtained by observation, sampling or experimentation (Interpolation, n.d.). Interpolation is widely used in many different fields, such as economics, finance, mathematics, computer science, etc.
Structure of the Experiment
----------------------------
Overall structure of the experiment includes two data input modules, three modules with R script and two data output modules.
Input Data
-----------
The experiment requires two (2) input data sets.
1. Sample data set used in this experiment is taken from the U.S. Bureau of Economic Analysis (2012) publication on Gross Domestic Product (GDP) and its major components. The data set includes the following columns: index, GDP, Consumption (personal consumption expenditures), Investment (gross private domestic investment), Government (government consumption expenditures and gross investment), Exports (exports of goods and services) and Imports (imports of goods and services). Time series represent quarterly US GDP and its major components from 1947Q1 to 2012Q2. Total of 262 data points.
2. The second data set is used for preparing placeholders for the interpolated data points. This data set has seven (7) columns (as the main data set) and 99 rows.
Execute R Script Modules
-------------------------
The experiment includes three (3) Execute R Script Modules.
Module 1 (top one) performs data set preparation for interpolation by inserting additional placeholder rows after each row of the main data set.
Module 2 performs linear interpolation using na.approx() function of the R zoo package.
Module 3 performs cubic spline interpolation using na.spline() function of the R zoo package.
Output Data
------------
The results of the experiment are available for download as .csv data sets. Each output data set (one for linear and one for cubic spline interpolation) has 26,101 data points.
**Note** that, although the experiment sample data set is based on the real GDP time series, interpolation (math term) should not be interpreted as disaggregation (economic term).
**Note** that cubic spline interpolation may take up to five (5) minutes to calculate.
**Note** that the views, opinions and conclusions expressed in this document are those of the author alone and do not necessarily represent the views of the author’s current or former employers.
References
----------
Interpolation. (n.d.). In Wikipedia. Retrieved March 5, 2018, from [https://en.wikipedia.org/wiki/interpolation][1]
Package ‘zoo’. S3 Infrastructure for Regular and Irregular Time Series (Z's
Ordered Observations) [http://zoo.R-Forge.R-project.org][2]
U.S. Bureau of Economic Analysis. (2012). GDP and Other Major NIPA Series, 1929–2012: II. Part of Survey of Current Business, 92(8), 183-212, August 2012. Retrieved: Jan. 19, 2018. [https://www.bea.gov/scb/pdf/2012/08%20August/0812%20gdp-other%20nipa_series.pdf][3]
Zeileis, Ac., Grothendieck, G. zoo: S3 Infrastructure for Regular and Irregular Time Series. Journal of Statistical Software, 2005, vol. 14, no i06. [http://www.jstatsoft.org/v14/i06][4]
[1]: https://en.wikipedia.org/wiki/interpolation
[2]: http://zoo.R-Forge.R-project.org
[3]: https://www.bea.gov/scb/pdf/2012/08%20August/0812%20gdp-other%20nipa_series.pdf
[4]: https://www.bea.gov/scb/pdf/2012/08%20August/0812%20gdp-other%20nipa_series.pdf