Ramya Krishnan_P181B34_NLP End-term
The model has been created & all the questions have been solved using the same model.
1. A 2-class neural network with default parameters was run. The module “train model” was used to train the model. On scoring & evaluating the models, the accuracy was found to be around 71.3% & precision to be around 74.9%. Other parameters noted down were as follows:
• True Positive = 1719
• False Positive = 576
• Recall = 84%
• F1 score = 79.2%
2. The hyper parameters were tuned & the ‘2 class neural network’ technique was implemented. On scoring & evaluating the models, the accuracy was found to be around 71.4% & precision to be around 74.7%. The accuracy is more or less as that without using the ‘tune model hyperparameters’ module. The other parameters obtained are as given below:
• Recall = 84.8%
• F1 score = 79.4%
Tuning the hyperparameters usually substitute the training model & helps improve upon the accuracy. But here, given that, all other values of the “2-class neural network” have been set as default, neither the accuracy nor the precision has improved.
3. Two ‘2-class neural network’ model have been built by tuning the hyperparameters. But this time, the no. of hidden layers has been defined to be 2.
One of the models has default no. of nodes per hidden layer i.e. 100.
The last model has 200 nodes per hidden layer.
The 2-class neural network has the following parameters, defined to improve upon the accuracy:
The initial learning weights diameter = 0.01
However neither the accuracy nor the precision seems to have been improved.
4. If a recommender were to be built, then the matchbox recommender would be used. The matchbox recommender reads a dataset of user-item-rating triples. In the given dataset, the content-based approach need not be possible, as we don’t have the user data. But collaborative filtering should be possible as we have some information of the user i.e. user ID, user’s name, etc. We shall pick the fields user ID, asin, & overall for the recommender.
Reviewer ID is a unique information for each reviewer. ‘asin’ shall be assumed as the product ID of the lawn mower. The field overall is the rating given by each reviewer. Hence we can select the columns “by rule” in the order userID, asin & overall.
5. The word-cloud has been modelled using the pre-processed review text field. A maximum of 100 words have been considered to create this word-cloud. The word ‘use’ has been found the most frequently in the reviews obtained.
Other observations:
On running the module “detect languages”, it was found that majority of the reviews (over 89%) were in English & the rest was found to be of ‘unknown language’.