Student Loan Repayment Rate Prediction

August 21, 2017
This is a machine learning model to predict student loan repayment for students in higher education institutions in the USA
This is a project for the July 2017 Data Science Capstone project for the Microsoft Professional Program for Data Science. The goal was to train a machine learning model that predicts the actual repayment rate for students who access a student loan facility to pay for tuition for higher education in the United States of America. Data Cleaning: There were some missing data which was cleaned using the mode for the numeric data, and zero for the binary data. The original data contained text data but they were converted to binary data using the Convert to Indicator module. Feature Selection: 90 features were selected from the total of 443 features using Filter Based Feature Selection, thereby selecting only the features with the highest correlation to the repayment rate. Data Split: The data was randomly split into 80 and 20 percent for training and testing respectively. Regression Model: A Boosted Decision Tree algorithm was used to create and train the model for predicting the repayment rate. FOR career opportunities, reach me on email: or mobile:+447903242113