Naive Bayes
The confusion matrices and other evaluation metrics for the Naive Bayes Models can be found below.
Evaluation Metrics:
- Accuracy: 0.903
Evaluation Metrics:
- Accuracy: (4245 + 3917) / (4245 + 3917 + 427 + 391) = 0.906
- Percent of articles predicted correctly
- Precision (Fake News): 4245 / (4245 + 427) = 0.909
- Percent of articles belonging to the fake class out of all articles that were predicted to be fake by the model.
- Precision (True News): 3917 / (3917 + 391) = 0.909
- Percent of articles belonging to the true class out of all articles that were predicted to be true by the model.
- Recall (Fake News): 4245 / (4245 + 391) = 0.916
- Percent of articles predicted correctly to the fake class out of all articles that actually belong to the fake class.
- Recall (True News): 3917 / (3917 + 427) = 0.902
- Percent of articles predicted correctly to the true class out of all articles that actually belong to the true class.
The term-frequency and count vectorizers performed very similarly when modeled with Naive Bayes. However, the count vectorized text data did perform slightly better with an accuracy of 90.6% compared to 90.3%.
Support Vector Machines
The confusion matrices and other evaluation metrics for the Support Vector Machine Models for different kernels can be found below.
Linear Kernel
Evaluation Metrics:
- Accuracy: 0.944
Evaluation Metrics:
- Accuracy: 0.914
The term-frequency and count vectorizers both performed well when modeled with Support Vector Machines using the linear kernel. However, the term-frequency vectorized text data performed better with an accuracy of 94.4% compared to 91.4%.
Sigmoid Kernel
Evaluation Metrics:
- Accuracy: 0.889
Evaluation Metrics:
- Accuracy: 0.613
The term-frequency vectorizer performed well with the sigmoid kernel. However, it is still worse than the accuracy when using the linear kernel (88.9% compared to 94.4%). The count vectorizer did not perform well at all with the sigmoid kernel, producing an accuracy of only 61.3%.
RBF (Gaussian) Kernel
Evaluation Metrics:
- Accuracy: 0.968
Evaluation Metrics:
- Accuracy: 0.521
The term-frequency vectorizer performed very well with the rbf kernel. It is the best performer, with an accuracy of 96.8%. The count vectorizer did not perform well at all with the rbf kernel, producing an accuracy of only 52.1%.
Polynomial Kernel (degree = 2)
Important note: using a degree of 3 was not significantly different from 2.
Evaluation Metrics:
- Accuracy: 0.963
Evaluation Metrics:
- Accuracy: 0.928
The term-frequency vectorizer and count vectorizer both performed well with the polynomial kernel and a degree of 2. The term-frequency vectorizer performed slightly worse than the best accuracy overall (96.3% compared to 96.8%).
Polynomial Kernel (degree = 4)
Evaluation Metrics:
- Accuracy: 0.963
Evaluation Metrics:
- Accuracy: 0.980
The term-frequency vectorizer and count vectorizer both performed well with the polynomial kernel and a degree of 4. Interestingly we see a massive increase in the accuracy of the count vectorizer. The 98% accuracy of the count vectorizer is the best performing model out of all of the other models.