Kaggle- Fraudulent or Genuine

April 9, 2017
Worked upon the kaggle credit card fraud detection dataset (highly imbalanced dataset) made use of oversampling.
While working on the dataset I balanced the data through oversampling using the python script as the data was highly imbalanced in nature. I used the two Class decision forest algorithm. As the class imbalance ratio is high , I recommend measuring the accuracy using the Area Under the Precision-Recall Curve (AUPRC). Confusion matrix accuracy is not meaningful for unbalanced classification.