Loss functions reaching global minima
In Deep Learning can we have train accuracy far less than 100% at the global minimum of the loss function?
I have coded a neural network in python to classify cats and non-cats. I chose a 2-layer network. It gave a train accuracy of 100% and a test accuracy of 70%.
When I increased the #layers to 4 the loss function is getting stuck at 0.6440 leading to train accuracy of 65% and a test accuracy of 34% for many random initializations.
We are expecting that our train accuracy on the 4-layer model should be 100%. But we are getting stuck at 65%. We are thinking that the loss function is reaching a global minimum since on many random initialization we are stagnating at a loss value of 0.6440. So, even though the loss function is reaching the global minimum, why is the train accuracy not reaching 100%? Hence our question,"In Deep Learning can we have train accuracy non-zero at the global minimum of the loss function?"
Sure this only depends on the capacity of the network. If you have only linear activations than the network is linear and the training accuracy is only 100% if the data is linear separable. For non-linear activation functions the capacity is not as clear. We do know in theory that a NN with a hidden layer is a universal function approximator given enough neurons (https://towardsdatascience.com/can-neural-networks-really-learn-any-function-65e106617fc6). So in theory it should be able to approximate any function arbitrarily well and therefore reach 100% train accuracy.
For your problem the main culprit is probably that you are stuck in some kind of local minima, which is just bad. Increasing the capacity should in theory never lead to a higher loss at the global optima. A lower loss does not necessarily mean that the accuracy is higher though.