Anomaly analysis of tips for City of Chicago rideshare data.
We completed an analytics study of rideshare data from the City of Chicago using anomaly algorithms to determine tipping on a ride. Instead of the customary binary categorization methods, we used anomaly because the data set is highly skewed towards not being tipped (17% of rides are tipped). We used a build count transform methodology due to the large amount of data (original data set is 4gb and we reduced this version to 10% of that in order to meet the Gallery restrictions of less than 100mb). As a result of the machine learning algorithm, we have a distribution of approximately 24% tipping of rides based on neighborhood, start and end hour, and fare amount as features. This is compared to 17% actual tipping data.