A demonstration of the impact that SMOTE has on the validation of results. Upsample with caution.
A demonstration of the impact that SMOTE has on the validation of results. I trained several Fraud detectors on a heavily skewed dataset using SMOTE to upsample at different frequencies, then compared the cross-validation of those trained models using data that was upsampled at different frequencies. As you can see the range of results goes from a Recall mean of around 50% to one of over 99%, depending on the level of simulated data (bias) hitting the validation data. Upsample with caution. *I put this together after coming across a number of cases where upsampling was being used to artificially inflate the validation recall.*