Fake News Detection Results

Naive Bayes

The confusion matrices and other evaluation metrics for the Naive Bayes Models can be found below.

Evaluation Metrics:

Accuracy: 0.903

Evaluation Metrics:

Accuracy: (4245 + 3917) / (4245 + 3917 + 427 + 391) = 0.906
- Percent of articles predicted correctly
Precision (Fake News): 4245 / (4245 + 427) = 0.909
- Percent of articles belonging to the fake class out of all articles that were predicted to be fake by the model.
Precision (True News): 3917 / (3917 + 391) = 0.909
- Percent of articles belonging to the true class out of all articles that were predicted to be true by the model.
Recall (Fake News): 4245 / (4245 + 391) = 0.916
- Percent of articles predicted correctly to the fake class out of all articles that actually belong to the fake class.
Recall (True News): 3917 / (3917 + 427) = 0.902
- Percent of articles predicted correctly to the true class out of all articles that actually belong to the true class.

The term-frequency and count vectorizers performed very similarly when modeled with Naive Bayes. However, the count vectorized text data did perform slightly better with an accuracy of 90.6% compared to 90.3%.

Support Vector Machines

The confusion matrices and other evaluation metrics for the Support Vector Machine Models for different kernels can be found below.

Linear Kernel

Evaluation Metrics:

Accuracy: 0.944

Evaluation Metrics:

Accuracy: 0.914

The term-frequency and count vectorizers both performed well when modeled with Support Vector Machines using the linear kernel. However, the term-frequency vectorized text data performed better with an accuracy of 94.4% compared to 91.4%.

Sigmoid Kernel

Evaluation Metrics:

Accuracy: 0.889

Evaluation Metrics:

Accuracy: 0.613

The term-frequency vectorizer performed well with the sigmoid kernel. However, it is still worse than the accuracy when using the linear kernel (88.9% compared to 94.4%). The count vectorizer did not perform well at all with the sigmoid kernel, producing an accuracy of only 61.3%.

RBF (Gaussian) Kernel

Evaluation Metrics:

Accuracy: 0.968

Evaluation Metrics:

Accuracy: 0.521

The term-frequency vectorizer performed very well with the rbf kernel. It is the best performer, with an accuracy of 96.8%. The count vectorizer did not perform well at all with the rbf kernel, producing an accuracy of only 52.1%.

Polynomial Kernel (degree = 2)

Important note: using a degree of 3 was not significantly different from 2.

Evaluation Metrics:

Accuracy: 0.963

Evaluation Metrics:

Accuracy: 0.928

The term-frequency vectorizer and count vectorizer both performed well with the polynomial kernel and a degree of 2. The term-frequency vectorizer performed slightly worse than the best accuracy overall (96.3% compared to 96.8%).

Polynomial Kernel (degree = 4)

Evaluation Metrics:

Accuracy: 0.963

Evaluation Metrics:

Accuracy: 0.980

The term-frequency vectorizer and count vectorizer both performed well with the polynomial kernel and a degree of 4. Interestingly we see a massive increase in the accuracy of the count vectorizer. The 98% accuracy of the count vectorizer is the best performing model out of all of the other models.