Model Evaluation Key Ideas
- Begin by fitting a basic model with reasonable parameters
- Then optimize parameters for best performance/speed
- Can do this by hand
- Can use optimization algorithms (Grid Search, Random Search, Bayesian Optimization)
- Attempt to classify NFL games as going over or under the projected number of total points scored by both teams
Modeling (Default Parameters)
To begin, a XGBoost model was created using the default parameters and all of the variables. This will act as the baseline model to further improve with hyperparameter tuning.
The confusion matrix and evaluation metrics can be viewed below.
- Accuracy: 0.53
- ROC AUC: 0.53
- Precision: 0.52
- Precision (Over): 0.47
- Precision (Under): 0.58
- Recall: 0.52
- Recall (Over): 0.51
- Recall (Under): 0.54
- F1: 0.52
- F-1 (Over): 0.49
- F-1 (Under): 0.56
Overall, this baseline xgboost model does a really bad job of classifying whether the number of points went over or under the total. With an accuracy of 53% it barely does a better job than randomly predicting. Next hyperparameter tuning will be used to hopefully improve the accuracy of the model.
Hyperparameter Tuning of Baseline Model
Now that a baseline model has been created, hyperparameter tuning will be implemented to determine the optimal parameters with regards to the accuracy of the model. GridSearchCV, RandomSearchCV, and Bayesian Optimization will be used.
GridSearchCV
Performing GridSearchCV to find the optimal parameters returned the following:
- colsample_bytree = 0.8
- gamma = 0
- reg_lambda = 0.5
- learning_rate = 0.1
- max_depth = 6
- min_child_weight = 1
- n_estimators = 100
- subsample = 0.8
RandomSearchCV
Coming Soon…
Bayesian Optimization
Coming Soon…
Modeling (GridSearchCV Tuned Model)
The confusion matrix and evaluation metrics for the GridSearchCV tuned model can be viewed below.
- Accuracy: 0.51
- ROC AUC: 0.51
- Precision: 0.50
- Precision (Over): 0.44
- Precision (Under): 0.56
- Recall: 0.50
- Recall (Over): 0.49
- Recall (Under): 0.52
- F1: 0.50
- F-1 (Over): 0.47
- F-1 (Under): 0.54
There is very little change in accuracy from the baseline to the GridSearchCV tuned model. This indicates that altering parameters does not really have much of an effect on the accuracy of the model and a XGBoost model in general is not effective for this task.
Cross Validation (GridSearchCV Tuned Model)
To verify that the accuracy of the model is similar to what was obtained my using one random train and test set cross validation will be performed. KFold cross validation can be used since the label distribution is approximately equal (48.5% Over, 51.5% Under). The following cross validation used 10 folds.
Fold 1 : 0.55
Fold 2 : 0.5
Fold 3 : 0.47
Fold 4 : 0.49
Fold 5 : 0.5
Fold 6 : 0.49
Fold 7 : 0.51
Fold 8 : 0.55
Fold 9 : 0.53
Fold 10 : 0.48
Mean Accuracy: 0.51
Cross validation revealed that the random train and test split that the tuned model was trained on performed the same compared to the mean. The accuracies are similar across each fold, which is ideal. This means that the model is performing similarly regardless of the way the training data is split (not much risk of having overfitting). So the mean accuracy of the cross validation, 51%, can be used as a good estimate of how the model will perform on real data.