NY Auto Collisions

May 13, 2019
High level analysis on various Auto collisions in New York city in 2017 are presented using different modules of Azure ML.
The following details process involved in the creation of the AML model. The purpose of this analysis is to run basic Data Science analysis to identify hour of the day when collisions peak, detect anomalies in the collisions and as well for creation clusters using K-Means. - Data The dataset is available for [public][1] from the NYC site, which contains various motor vehicle collisions info. For this analysis, I have used 2017 info. - Retrieve data Though the above website provides multiple years with 29 features, I have focused only on 4 features for this analysis as other features required substantial amount of data wrangling and beyond the scope of this experiment. - Prepare Data Given the amount of information and number of features available, the EDA was performed using R-Studio to better understand the dataset and for feature selection. Also, any instances where collisions were recorded but with no injuries or persons killed, they were not considered for this experiment. - Preprocess Data Various R-Script were created for understand the correlation between persons injured and killed using different ggplot geometrics. In regard to clustering, Sweep was used with different parameters to evaluate number of centroids against the variable’s persons injured or killed. - Algorithm The analysis attempts to identify worst time of the hour in a day for the collisions. Also creates clustering for persons injured or killed which can be eventually used for further investigation. Each of execute R-Script produces various plots for both variables. A time series anomaly detection was also created to evaluate any anomalies over the full year. - Results From the analysis, most of the injuries happens at 8-9 AM or 5-6 PM. Most persons getting killed are at 6-7 AM or 3-4 PM. Also, collisions on May 18 at 11:54 PM and Oct 10 at 17:10 created most injuries. Each of the Execute R-Script modules create various plots. Acknowledgement Thanks to NYPD, for providing the dataset and making it available to public [1]: https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95