Divya Khurana_P181A14 (QUESTION 3:TWO CLASS NEURAL NETWORK-TUNING THE HYPER PARAMETERS WITH 2 HIDDEN LAYERS) - FINAL
3. METHODOLOGY
1. Data was cleaned in Excel (Clean and Trim).
2. The only required columns are Text reviews and rating columns .So,all other columns were removed.
3. Data was cleaned to remove missing values.14 missing values were found.
4. The rating column was changed to categorical values under which the ratings 1,2,3,4 were grouped as NOT 5 while the rating 5 was grouped as 5.
5. The reviews column was preprocessed to remove any stop words,punctuation,numbers etc.
6. DTM was created using the preprocessed text.TD-IDF was chosen as the weighting scoring method.
7. Model was trained two class neural network and two Hidden layers were added with 300 input neurons to get better results
8. Initially the parameter sweeping mode was changed to Entire Grid.
RESULTS:
• Overall Accuracy was decreased to 72.6%
• Since, the Amazon client wants to have the ability to predict reviews with “5” stars,we shall also look at PRECISION which is the ratio of true positives by true positive+false positive.Precision has been found out to be 73.3%.
• A total of 471 reviews which have been rated as 1,2,3 or 4 have been falsely classified as ‘5’ i.e. there are a total of 471 false positives.
• About 437 reviews which have actually been rated as 5 have been falsely classified as ‘NOT 5’ i.e. there are a total of 437 false negatives. Thus ,recall is around 74.8% which is not so good.
• Thus,2 hidden layers didn’t have a good impact in helping us get better results.