Predicting Wine Quality through Binary Classification Model.
The goal of this experiment is to build a binary classification model to predict the quality of a given wine into high or low-quality wine given its various properties. In the data set, the quality of the wine is rated on a scale of 1 to 10. For binary classification into high and low-quality wines, using the “Group Data into Bins” module to convert the ratings into a binary categorical variable, the quality rating of 7 or more have been grouped into one bin (the High Quality or the HQ) and the quality rating of 6 or less into another bin (the low quality or the LQ).The data has been split into two sets – 70% for training and 30% for testing. The dataset was trained, scored and evaluated for three data mining models which are :Two Class Boosted Decision Tree Classification Model,a Two Class Logistics Regression Model and a Two Class Neural Network Model. ***Business Decision :*** Here the business decison was to use data mining to price the wine by predicting its quality. The threshold probability level in the models were adjusted to minimize the impact of missclassifications. The best performing data mining model's (based on the maximum area under curve, AUC metric) confusion matrix was used to calculate the profitability by pricing the HQ wine $65 a bottle and LQ wine $30 a bottle and the cost of producing the wine is assumed to be $20 a bottle.