Devrup Banerjee P181A13

December 14, 2019
To find accuracy of various models and compare it with neural networks .
1. The dataset contained amazon reviews and 5 classes classifying those reviews. 14 missing values were prsent in the text column in 13273 data points. 3 classifiers namely , 2 class logistic regression, 2 class neural netowrks and 2 class boosted decision trees were used. 2 class neural network with 2 hidden layers gave an accuracy of 61.4%. 2 class neural networks with hypertuned parameters and random sweep gave accuracy of 65.8% 2 class neural network with hypertuned and 100 layers default gave accuracy of 59.4% while 2 class neural network hypertuned at random sweep and default 100 layers gave accuracy of 65.8%. This prooves increasing the complexity o the model doesnt result in increased accuracy in this problem. we also tried entire grid search with 2 class hypertuned and it gave an accuracy of 66.2% 2 class boosted decsion trees with hypertuned and random sweep gave accuracy at 64.5%. 2 class logistic regression hypertuned at random grid gave accuracy of 66% 2. For building the recomendor system, we have to take reviewerID, asin, and overall in the said order. we dont have to normalize overall ratings as it falls under 0-100 range which can be handled by matchbox recomendor. Future of the model: We can consider feeding the outputs of the above three models, into another meta model, say random forest classifier , where the factors it will get trained on are the probabilities of each class as identified by the said 3 models previously. the trget column wiill remain the same. This kind of ensembling technique can significantly increase the accuracy of prediction and give better results than what individual models provided