Give_Me_Some_Credit
Detect the credit card fraud problems.
The dataset is downloaded from the kaggle, Give Me Some Credit (https://www.kaggle.com/c/GiveMeSomeCredit/)
There are two main problems in the dataset.
Firstly, it contains some missing values. Around 20% of the observations are uncompleted (saved as NA).
Also, most of the input features have some outliers which need to be dealt with.
In total, the dataset has 250,000 observations which the training set has 150,000 observations and
the test set has 100,000 observations. After the data cleaning and transformation, we split the data into
two parts. We use 70% of the data as training and the rest 30% as testing. Because the class labels of
training data are given. Therefore, we use the supervised learning to train the data and considering the
labels’ output is 0 or 1 which corresponds to default or not, so we decide to use the supervised learning
classification model. We use two algorithms, two-class boosted decision tree and logistic regression.
After the training, the trained models are tested on the testing data and the overall accuracy, one gives
us 93% and the other gives us 92%.