CIS5560 Midterm Spr 2018

March 9, 2018
Improve Performance of AUC better than this
First I changed the machine learning model to Two Class Boosted Decision Tree to improve the accuracy by which the AUC changed to 0.954. Decision Tree algorithm works best for the Wids dataset. The following are the properties that were set for the algorithm. Next I changed the split type to Stratified Split using is_female label. And changed the fraction of split to 0.6. Used Edit Metadata to make all numeric features to categorical features. Used Clip values to clip both Peaks and Sub Peaks with percentile value and threshold for the features AA14, AA3, AA4, AA7, AA15. Used to Permutation Feature Importance to score the features and removed the features which had no effect on the label. Used Smote model to oversample the minority cases. SMOTE stands for Synthetic Minority Oversampling Technique. This is a statistical technique for increasing the number of cases in your dataset in a balanced way. The module works by generating new instances from existing minority cases that you supply as input. This implementation of SMOTE does not change the number of majority cases.