Model Evaluation Key Ideas
- Simpler is better
- Begin by fitting a Random Forest with all variables and default parameters
- Then reduce the depth and/or number of variables until accuracy is significantly impacted
- Leaf nodes with very few samples indicate overfitting
- Reduce the depth of the tree until there at least aren’t leaf nodes containing very few samples
- Can also specify the min_samples_leaf parameter to be greater than a chosen value
- Random Forest can be used to determine variable importance
- Attempt to classify NFL games as going over or under the projected number of total points scored by both teams
Modeling (Default)
To begin, a Random Forest model was created using the default parameters and all of the variables. This will act as the baseline model to further improve with hyperparameter tuning.
The confusion matrix and evaluation metrics can be viewed below.
- Accuracy: 0.50
- ROC AUC: 0.50
- Precision: 0.49
- Precision (Over): 0.48
- Precision (Under): 0.51
- Recall: 0.49
- Recall (Over): 0.44
- Recall (Under): 0.54
- F1: 0.49
- F-1 (Over): 0.46
- F-1 (Under): 0.53
Overall, this baseline random forest model does a really bad job of classifying whether the number of points went over or under the total. With an accuracy of 50% it doesn’t even do a better job than randomly predicting. Next hyperparameter tuning will be used to hopefully improve the accuracy of the model.
Hyperparameter Tuning of Baseline Model
Now that a baseline model has been created, hyperparameter tuning will be implemented to determine the optimal parameters with regards to the accuracy of the model. GridSearchCV, RandomSearchCV, and Bayesian Optimization will be used.
GridSearchCV
Performing GridSearchCV to find the optimal parameters returned the following:
- criterion = ‘gini’
- max_depth = 9
- max_features = 2
- min_samples_leaf = 5
- min_samples_split = 9
- n_estimators = 100
RandomSearchCV
Coming Soon…
Bayesian Optimization
Coming Soon…
Modeling (GridSearchCV Tuned Model)
The confusion matrix and evaluation metrics for the GridSearchCV tuned model can be viewed below.
- Accuracy: 0.51
- ROC AUC: 0.50
- Precision: 0.50
- Precision (Over): 0.49
- Precision (Under): 0.52
- Recall: 0.50
- Recall (Over): 0.32
- Recall (Under): 0.68
- F1: 0.49
- F-1 (Over): 0.39
- F-1 (Under): 0.59
There is very little change in accuracy from the baseline to the GridSearchCV tuned model. This indicates that altering parameters does not really have much of an effect on the accuracy of the model and a Random Forest model in general is not effective for this task.
Cross Validation (GridSearchCV Tuned Model)
To verify that the accuracy of the model is similar to what was obtained my using one random train and test set cross validation will be performed. KFold cross validation can be used since the label distribution is approximately equal (48.5% Over, 51.5% Under). The following cross validation used 10 folds.
Fold 1 : 0.53
Fold 2 : 0.54
Fold 3 : 0.53
Fold 4 : 0.51
Fold 5 : 0.54
Fold 6 : 0.48
Fold 7 : 0.5
Fold 8 : 0.53
Fold 9 : 0.55
Fold 10 : 0.53
Mean Accuracy: 0.52
Cross validation revealed that the random train and test split that the tuned model was trained on slightly underperformed compared to the mean. The accuracies are similar across each fold, which is ideal. This means that the model is performing similarly regardless of the way the training data is split (not much risk of having overfitting). So the mean accuracy of the cross validation, 52%, can be used as a good estimate of how the model will perform on real data.
Feature Importance
Using the GridSearchCV model, the feature importance can be found from the coefficients and plotted.
Looking at the plot, the impactful variables are ‘avg_away_total_yards_against’, ‘team_elo_diff’, ‘avg_home_total_yards’, ‘qb_elo_diff’, ‘avg_away_total_yards’, ‘avg_home_total_yards_against’, ‘total_qb_elo’, ‘total_line’, ‘wind’, and ‘temp’. Again, since the model didn’t perform well this reinforces that a decision tree isn’t the best model for these features. To verify that there really are the important features, a reduced decision tree model can be created. A similar accuracy should be seen with the reduced model.