Data for all high schools in the state of FL. Objective is to do performance score prediction.
This experiment is part of data processing, preparation, cleaning and validations for predictive analysis. Steps: Load data set> Summarize data > convert to CSV to find missing values> Do qualitative analysis to remove noise . Select columns based on influence on prediction> Clean missing data/column> If >20% missing 'remove' If < 20% missing values, impute with mean or median values ( make sure they are close) Business objective: Based on FL high school data set , find which features / characteristics have the highest impact on school ranking? Idea is to use Azure ML modules like Compute Linear Correlation selection and, the Filter Based Feature Selection module to list the impacting features on the predictor label (Score/Rank).