Tensorflow model.evaluate gives different result from that obtained from training
I am using tensorflow to do a multi-class classification
I load the training dataset and validation dataset in the following way
train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, validation_split=0.2, subset="training", shuffle=True, seed=123, image_size=(img_height, img_width), batch_size=batch_size) val_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, validation_split=0.2, subset="validation", shuffle=True, seed=123, image_size=(img_height, img_width), batch_size=batch_size)
Then when I train the model using model.fit()
history = model.fit( train_ds, validation_data=val_ds, epochs=epochs, shuffle=True )
I get validation accuracy around 95%.
But when I load the same validation set and use model.evaluate()
I get very low accuracy (around 10%).
Why am I getting such different results? Am I using the model.evaluate function incorrectly?
Note : In the model.compile() I am specifying the following, Optimizer - Adam, Loss - SparseCategoricalCrossentropy, Metric - Accuracy
A few common errors here:
- Your model is overfitting the training dataset
- You are applying a transform to your training data that you are not applying to your validation set.
- The statistics of your training and validation set are significantly different (I don't think this is happening here, but you never know).
Debugging a ML model is always hard, but a rule of thumb for starting a new ML project is to start with a minimal representative example to make sure your data pipeline and model are performing correctly. If you've already done that, see the suggestions below.
A few suggestions (in order!):
Make sure you're not applying a transform to your training set that you're not applying to your validation set.
Train your model for fewer epochs and see if the performance on the validation set improves. You can also train your model for one epoch and then validate it at the end of the epoch to see how it's doing after each epoch.
Form a confusion matrix for the model on the validation set, and see how it's doing for each class. Is there one class that it overpredicts or does it do poorly for all classes?
Make sure you have enough data to validate on. If there's not enough data, you may have very different training and validation distributions.
If none of this works, make sure you can fit the validation set with your model.