Sports Betting Logistic Regression Results

Logistic Regression in Python

Model Evaluation Key Ideas

Logistic Regression can be used to determine feature importance
- When data is normalized the coefficient value correlates importance
Attempt to classify NFL games as going over or under the projected number of total points scored by both teams

Modeling (Default)

To begin, a logistic regression model was created using all of the variables and default parameters. This will act as the baseline model to further improve with hyperparameter tuning.

The confusion matrix and evaluation metrics can be viewed below.

Accuracy: 0.54
ROC AUC: 0.54
Precision: 0.53
- Precision (Over): 0.52
- Precision (Under): 0.55
Recall: 0.53
- Recall (Over): 0.37
- Recall (Under): 0.69
F1: 0.52
- F-1 (Over): 0.43
- F-1 (Under): 0.61

Overall, this baseline logistic regression model does a really bad job of classifying whether the number of points went over or under the total. With an accuracy of 54% it barely does a better job than randomly predicting. It’s interesting to note that it really struggles with predicting the ‘Under’ when it should predict the ‘Over’. This is where the majority of the accuracy is being lost. Next hyperparameter tuning will be used to hopefully improve the accuracy of the model.

Hyperparameter Tuning of Baseline Model

Now that a baseline model has been created, hyperparameter tuning will be implemented to determine the optimal parameters with regards to the accuracy of the model. GridSearchCV, RandomSearchCV, and Bayesian Optimization will be used.

GridSearchCV

Performing GridSearchCV to find the optimal parameters returned the following:

C=0.5
penalty=’l2′
solver=’liblinear’
max_iter=500

RandomSearchCV

Coming Soon…

Bayesian Optimization

Coming Soon…

Modeling (GridSearchCV Tuned Model)

The confusion matrix and evaluation metrics for the GridSearchCV tuned model can be viewed below.

Accuracy: 0.54
ROC AUC: 0.54
Precision: 0.54
- Precision (Over): 0.52
- Precision (Under): 0.55
Recall: 0.53
- Recall (Over): 0.36
- Recall (Under): 0.70
F1: 0.52
- F-1 (Over): 0.43
- F-1 (Under): 0.61

There is very little change in accuracy from the baseline to the GridSearchCV tuned model. This indicates one of two things: that altering parameters does not really have much of an effect on the accuracy of the model or the baseline model outperformed the average accuracy on the single train test split it was fit on.

Cross Validation (GridSearchCV Tuned Model)

To verify that the accuracy of the model is similar to what was obtained my using one random train and test set cross validation will be performed. KFold cross validation can be used since the label distribution is approximately equal (48.5% Over, 51.5% Under). The following cross validation used 10 folds.

Fold 1 : 0.54
Fold 2 : 0.54
Fold 3 : 0.53
Fold 4 : 0.52
Fold 5 : 0.52
Fold 6 : 0.53
Fold 7 : 0.52
Fold 8 : 0.5
Fold 9 : 0.53
Fold 10 : 0.5

Mean Accuracy: 0.52

Cross validation revealed that the random train and test split that the tuned model was trained on slightly overperformed compared to the mean. The accuracies are similar across each fold, which is ideal. This means that the model is performing similarly regardless of the way the training data is split (not much risk of having overfitting). So the mean accuracy of the cross validation, 53%, can be used as a good estimate of how the model will perform on real data.

Feature Importance

Using the GridSearchCV model, the feature importance can be found from the coefficients and plotted.

The closer to zero, the less important a variable is. Looking at the plot, the most important features appear to be ‘wind’, ‘total_line’, ‘avg_home_total_yards’, ‘qb_elo_diff’, ‘avg_away_total_yards’, and ‘surface dessograss’. To verify that these are the important features, a reduced logistic regression model can be created. A similar accuracy should be seen with the reduced model.

Modeling (Reduced Model: Only Important Features)

Next, a reduced model using only important features, was created. The parameters remain the same as the ones found through GridSearchCV.

Fold 1 : 0.53
Fold 2 : 0.54
Fold 3 : 0.52
Fold 4 : 0.54
Fold 5 : 0.53
Fold 6 : 0.54
Fold 7 : 0.49
Fold 8 : 0.5
Fold 9 : 0.56
Fold 10 : 0.5

Mean Accuracy: 0.52

The mean accuracy after cross validation is the same, which verifies that only those features were contributing significantly.