Articles Dataset
Based on the results, it appears that the news articles can be clustered by topic. However, they have to be clustered hierarchically as K-Means ended up performing very poorly. This was somewhat expected as the articles dataset is high dimensional and K-Means used Euclidean Distance, while the hierarchical clustering used cosine similarity as the distance metric. To give K-Means a chance to be successful the articles dataset would need to have PCA performed on it. This might be worth trying more in the future. However, since hierarchical clustering was successful, it is sufficient for now. The success of hierarchical clustering means that unknown news articles could potentially be labeled using the prediction of clustering. This is great news as it means unknown articles could be filtered to find only the one related to sports betting.
Games Dataset
Based on the results, it appears that the games can be clustered into two groups. This is ideal as the goal is to accurately predict whether an NFL game will go over or under the total. The success of the clustering means it may be possible to accurately predict the result of an NFL game in terms of the total by clustering it into a group. This would require some further analysis, but is definitely worth considering when determining the best model at culmination of this project. K-Means performed better than hierarchical clustering, so any predictions would be made using K-Means.