December 14, 2019
METHODOLOGY: A two class neural network with default parameters has been build through the following steps: 1. Data was cleaned in Excel (Clean and Trim). 2. The only required columns are Text reviews and rating columns .So,all other columns were removed. 3. Data was cleaned to remove missing values.14 missing values were found. 4. The rating column was changed to categorical values under which the ratings 1,2,3,4 were grouped as NOT 5 while the rating 5 was grouped as 5. 5. The reviews column was preprocessed to remove any stop words,punctuation,numbers etc. 6. DTM was created using the preprocessed text.TD-IDF was chosen as the weighting scoring method. 7. Model was trained and scored using two class neural network. RESULTS: • Overall accuracy of 79.2% has been obtained. • Since, the Amazon client wants to have the ability to predict reviews with “5” stars,we shall also look at PRECISION which is the ratio of true positives by true positive+false positive.Precision has been found out to be 79.5% • Only 362 reviews which have been rated as 1,2,3 or 4 have been falsely classified as ‘5’ i.e. there are a total of 362 false positives. • About 329 reviews which have actually been rated as 5 have been falsely classified as ‘NOT 5’ i.e. there are a total of 329 false negatives. Thus ,recall is around 81% which is quite good.