Poisson Regression: Parking in Birmingham
This experiment uses Poisson Regression to score parking availability in Birmingham UK through Azure ML
**Detailed Description**
Various process involved in the creation of Azure Machine learning are detailed below.
- Data
The dataset is available for public from the UCI site, which contains features like SystemCodeNumber, Capacity, Occupancy and LastUpdated (Date and time Stamp) of various parking locations in Birmingham. In addition, datasets related to local weather and school term days were captured to evaluate the dependence on the occupancy data. The goal of this experiment is to predict the occupancy for a given time period.
Parking_Birmingham
Provides the parking lot info along with capacity, Occupancy over a period from Oct – Dec 18, 2016, click -->
[Parking in Birmingham][1]
for the dataset.
Daily_Weather_Birmingham
Provides the various weather-related info during the same period near Birmingham Airport, click -->
[Birmingham Airport Weather][2] for the dataset.
Birmingham_School_Holidays
Provides list of School holidays, with dates flagged for holidays during the Oct – Dec 2016, click -->
[Birmingham School Holidays][3] for the dataset.
- Retrieve Data
All the 3 datasets were imported into Azure Machine Learning workspace using “Upload a new dataset from a local file” without any changes / updates to the raw information.
- Prepare Data
Given the sources of datasets were different and it was decided to handle them as-is, SQL transformation was used with below SQL statements (refer: https://sqlite.org/lang.html)
> SELECT
> t1.[SystemCodeNumber],
> t1.[Capacity],
> t1.[Occupancy],
> t1.[LastUpdated],
> date(t1.[LastUpdated]) AS LastUpdatedDate,
> time(t1.[LastUpdated]) AS LastUpdatedTime,
> t2.TempAvg, t2.HumidityMax, t2.WindSpeedMax,
> t3.CalendarDay, t3.HolidayFlag
> FROM t1 INNER JOIN t2
> ON date(t1.[LastUpdated]) = date(t2.Date)
> INNER JOIN t3
> ON date(t1.[LastUpdated]) = date(t3.CalendarDate)
> WHERE t1.[Occupancy] >=0
After initial run with “Summarize Data” module, the existence of negative values were found and removed through “Applying SQL transformation”
- Preprocess Data
After various training run, it was found that School holidays had no correlation to the occupancy rate in the parking and hence it was removed through “Select Columns in Dataset”
- Algorithm
Various module used for Feature selection to understand the influence on the Occupancy rate and as well Hyperparameters were used to tune the training model. I have purposeful, left some of these modules as part of the experiment to help better understand its functionality and these can be removed if you plan on using for a larger dataset or deploying to production.
- Results
The scored labels were downloaded using “Convert to CSV
**Acknowledgment**
Thanks to Daniel H. Stolfi, University of Malaga - Spain. for providing the dataset and making it available to public. Also for WeatherUnderground for provision the Weather information and as well as for FamiliesOnline for School holiday info
[1]: https://archive.ics.uci.edu/ml/datasets/Parking+Birmingham
[2]: https://www.wunderground.com/history/monthly/gb/birmingham/EGBB/date/2016-12
[3]: https://www.familiesonline.co.uk/local/central-birmingham/in-the-know/birmingham-central-school-term-dates-2016-2017