Initial ANN Model
Architecture:
The initial model consists of two hidden layers and an output layer. It is simple on purpose to act as a baseline.
Layer (type) Output Shape Activation Param #
dense_33 (Dense) (None, 8) sigmoid 184
dense_34 (Dense) (None, 16) relu 144
dense_35 (Dense) (None, 1) sigmoid 17
=================================================================
Total params: 345 (1.35 KB)
Trainable params: 345 (1.35 KB)
Epochs: 200 Total
Epoch 1/200
154/154 [==============================] – 2s 6ms/step – loss: 0.6940 – accuracy: 0.5022 – val_loss: 0.6936 – val_accuracy: 0.5292
Epoch 2/200
154/154 [==============================] – 0s 3ms/step – loss: 0.6934 – accuracy: 0.5020 – val_loss: 0.6927 – val_accuracy: 0.5292
Epoch 199/200
154/154 [==============================] – 0s 2ms/step – loss: 0.6851 – accuracy: 0.5473 – val_loss: 0.6905 – val_accuracy: 0.5179
Epoch 200/200
154/154 [==============================] – 0s 2ms/step – loss: 0.6841 – accuracy: 0.5524 – val_loss: 0.6934 – val_accuracy: 0.5097
Loss Plot:
Accuracy Plot:
Test Loss: 0.7002
Test Accuracy: 0.5081
Summary:
Looking at the loss and accuracy plots, there is essentially no change in either. This indicates the model isn’t learning, and underfitting is occurring. It’s also possible that there is a vanishing gradient issue, but since I started with simple architecture, it’s much more likely the issue is underfitting. For the final model, the main focus will be on increasing the complexity of the model architecture enough to allow learning to occur, but not too much to cause overfitting.
Intermediate ANN Model
Architecture:
Layer (type) Output Shape Activation Param #
dense_136 (Dense) (None, 100) relu 2300
dense_137 (Dense) (None, 100) relu 10100
dropout_44 (Dropout) (None, 100) 0
dense_138 (Dense) (None, 1) sigmoid 101
=================================================================
Total params: 12501 (48.83 KB)
Trainable params: 12501 (48.83 KB)
Epochs:
Epoch 1/200
154/154 [==============================] – 2s 5ms/step – loss: 0.6963 – accuracy: 0.5071 – val_loss: 0.6918 – val_accuracy: 0.5211
Epoch 2/200
154/154 [==============================] – 1s 4ms/step – loss: 0.6928 – accuracy: 0.5162 – val_loss: 0.6940 – val_accuracy: 0.5211
Epoch 199/200
154/154 [==============================] – 1s 5ms/step – loss: 0.5612 – accuracy: 0.6865 – val_loss: 0.9260 – val_accuracy: 0.5211
Epoch 200/200
154/154 [==============================] – 1s 6ms/step – loss: 0.5579 – accuracy: 0.6871 – val_loss: 0.9127 – val_accuracy: 0.5227
Loss Plot:
Accuracy Plot:
Test Loss: 0.8502
Test Accuracy: 0.5617
Summary:
Using a more complex architecture resulted in a much higher accuracy on the train set. However, the loss and accuracy plots clearly show the model is now overfitting. The validation accuracy never really improves after approximately the 20th epoch. The model clearly needs to have it’s complexity reduced as just using a dropout layer was not enough. This will be addressed in the final model.
Final ANN Model
Architecture:
Layer (type) Output Shape Activation Param #
dense_151 (Dense) (None, 8) relu 184
dense_152 (Dense) (None, 16) relu 44
dropout_49 (Dropout) (None, 16) 0
dense_153 (Dense) (None, 1) sigmoid 17
=================================================================
Total params: 345 (1.35 KB)
Trainable params: 345 (1.35 KB)
Epochs:
Epoch 1/50
154/154 [==============================] – 2s 4ms/step – loss: 0.6954 – accuracy: 0.5085 – val_loss: 0.6941 – val_accuracy: 0.5049
Epoch 2/50
154/154 [==============================] – 1s 4ms/step – loss: 0.6940 – accuracy: 0.5067 – val_loss: 0.6929 – val_accuracy: 0.5146
Epoch 3/50
Epoch 49/50
154/154 [==============================] – 1s 4ms/step – loss: 0.6870 – accuracy: 0.5503 – val_loss: 0.6898 – val_accuracy: 0.5519
Epoch 50/50
154/154 [==============================] – 1s 5ms/step – loss: 0.6868 – accuracy: 0.5444 – val_loss: 0.6901 – val_accuracy: 0.5503
Loss Plot:
Accuracy Plot:
Test Loss: 0.6906
Test Accuracy: 0.5373
Confusion Matrix:
Summary:
The final model architecture is very similar to the first one. The only difference is using a relu activation in the first layer and adding a dropout layer after the second hidden layer. Increasing the model complexity further or training on additional epochs past 75 results in overfitting. So, while the final results aren’t great, (a validation accuracy of 0.5503 and a test accuracy of 0.5373) there isn’t much more that can be done to improve the model. However, the goal is to obtain a model with accuracy greater than 52.4%, as that is the mark to be profitable. So, in that regard the final model is actually successful.