Devrup Banerjee P181A13
To find accuracy of various models and compare it with neural networks .
1. The dataset contained amazon reviews and 5 classes classifying those reviews. 14 missing values were prsent in the text column in 13273 data points.
3 classifiers namely , 2 class logistic regression, 2 class neural netowrks and 2 class boosted decision trees were used.
2 class neural network with 2 hidden layers gave an accuracy of 61.4%. 2 class neural networks with hypertuned parameters and random sweep gave
accuracy of 65.8% 2 class neural network with hypertuned and 100 layers default gave accuracy of 59.4% while 2 class neural network hypertuned at random sweep and default 100 layers
gave accuracy of 65.8%. This prooves increasing the complexity o the model doesnt result in increased accuracy in this problem. we also tried entire grid search with 2 class hypertuned and it gave an accuracy of 66.2%
2 class boosted decsion trees with hypertuned and random sweep gave accuracy at 64.5%. 2 class logistic regression hypertuned at random grid
gave accuracy of 66%
2. For building the recomendor system, we have to take reviewerID, asin, and overall in the said order. we dont have to normalize overall ratings as it falls under 0-100 range which can be handled
by matchbox recomendor.
Future of the model:
We can consider feeding the outputs of the above three models, into another meta model, say random forest classifier , where the factors it will get trained on are
the probabilities of each class as identified by the said 3 models previously. the trget column wiill remain the same.
This kind of ensembling technique can significantly increase the accuracy of prediction and give better results than what individual models provided