I initially thought that this would be a somewhat difficult classification task as the differences between positive and extremely positive or negative and extremely negative language is not very different in terms of words used. Instead, the differences would likely show up syntactically or in the overall context. As a result, a TF-IDF model likely wouldn’t be as successful as a neural network model with embeddings. This was confirmed by the results. The best baseline TF-IDF model provided a 0.5824 accuracy. The neural network models showed improvement. The best ANN model provided a 0.6717 accuracy and the best LSTM model provided a 0.7120 accuracy.
Future Scope
To further improve model accuracy a few methods could be attempted.
- Leave in ‘!’ when doing pre-processing
- Could potentially help differentiate between extreme and regular classes
- Remove some of the most common words found in the plots that are shared among all classes
- Would reduce noise
- Keep trying different neural network architectures