City of Chicago Ride Share Tip Analysis - 10% DataSet

May 23, 2019
Anomaly analysis of tips for City of Chicago rideshare data.
We completed an analytics study of rideshare data from the City of Chicago using anomaly algorithms to determine tipping on a ride. Instead of the customary binary categorization methods, we used anomaly because the data set is highly skewed towards not being tipped (17% of rides are tipped). We used a build count transform methodology due to the large amount of data (original data set is 4gb and we reduced this version to 10% of that in order to meet the Gallery restrictions of less than 100mb). As a result of the machine learning algorithm, we have a distribution of approximately 24% tipping of rides based on neighborhood, start and end hour, and fare amount as features. This is compared to 17% actual tipping data.