Middle School ranking prediction ( by JOSEPH BIDIAS) on 9/26/2019 [Predictive Exp.]
Prediction of the middle school ranking using/comparing Azure Machine learning models( Boosted Decision tree regression )
The data was obtained from a work project with a size of ~500KB in a XLSX format and converted in a csv for better exploration with tools got..
We decided to visualize the data using Tableau to get a sense of the behavior of the data and notice some statistical significance and orientation for our next models to use.
We notice a correlation between the race ( Hispanic ) , the city ,and the ranking.
The data-set that has been imported to ML Studio Data is clean, there are no missing data.
For our machine learning experiment we decide to 3 algorithms : Linear, Boosted Decision tree and two class boosted decision tree.
For the column selection we choose to 18 over 32. The data-set used was;
[Rank (2016-17)],
[School],
[District],
[City],
[Number Students],
[Number Fulltime Teachers],
[Percent African American],
[Percent American Indian],
[Percent Asian],
[Percent Hispanic],
[Percent Pacific Islander],
[Percent Two or More Races],
[Percent White],
[Average Standard Score (2016-17)],
[Average Standard Score (2015-16)],
[Rank (2015-16)]
[Rank Change from (2015-16)],
[SchoolDigger Star Rating].
For the filter based feature selection we set {ranking(2016-2017)} as target for the training ( latest ranking available )
~Mertric for the Linear regression
Mean Absolute Error 0.076354;
Root Mean Squared Error 0.102441;
Relative Absolute Error 0.090895
Relative Squared Error 0.01103;
Coefficient of Determination 0.98897;
~Metrics for the boosted decision tree
Mean Absolute Error 0.006623;
Root Mean Squared Error 0.008177;
Relative Absolute Error 0.007884;
Relative Squared Error 0.00007;
Coefficient of Determination 0.99993;
~For the 2 class boosted decision tree
True Positive 478;
False Negative 0;
False Positive 1;
True Negative 0;
Accuracy 0.998;
Precision 0.998;
Recall 1.000;
F1 Score 0.999;
Threshold 0.5;
AUC 0.000;
Positive Label 9;
Negative Label 3;
Boosted decision tree provides the most precise predictions while the variation of the prediction is the lowest. This prediction of the needs further analysis with new data whenever possible to prevent overfitting.