Amazon garden and lawn

December 14, 2019
Notes: 1) Ratings 1,2,3,4 have been grouped as ‘Others’ and a rating of 5 is defined as ‘5star’. 2) The features selected in the model are 1000. 3) 14 data points which had missing values in ‘reviewText’ column have been removed. 4) The random seed is set as ‘12345’. 5) Train and test data is split in the ratio of 3:1 6) N gram size has been assumed 2 and weighting function is ‘TF*IDF’. 7) Feature reduction is done through Chi Square. Ans 1. Two class neural network with default parameters is showing an accuracy of 60.2%. The area under the curve is 61.8%. Ans 2. A) After tuning the hyper parameters, With default parameters, the model accuracy has risen to 64.4% with the Area under the curve being 69.6%. B) With changed parameters (Max no of runs on random sweep changed from 5 to 10), the accuracy has gone down marginally to 64.2% and area under the curve has also decreased marginally to 69%. While changing the parameters, I tried to specify parameter sweeping mode to entire grid but the results could not be obtained as I had to kill the task because It was taking a lot of time in processing (More than 8 minutes). Ans 3. A) When the hyper parameters are tuned with two hidden layers (layers added through R script) and rest default parameters, the accuracy of the model is 64.1%. B) With changed parameters ( Random grid instead of random sweep,, max runs =4), the accuracy has increased to 64.7% which Is the highest of all the scenarios which are tested and the AUC in this case is also the highest at 69.7%. Ans 4. While building a recommender system, the dataset will need to be sequentially structured as Reviewer ID (Customer id), ASIN (Product id) and the overall feature rating/likelihood (Categorical and dependent Variable). After structuring the data in this way, we train, score and evaluate the results while also selecting the kind of prediction we require (Rating based/ item based). Ans 5. The word cloud has been generated with the R script and is available in the model. Key phrases have been extracted and the word cloud has been made on those.