Extremely poor accuracy and high training time per epoch for a word level language model
Setting up my first word-level language model using the Keras API. Extremely poor accuracy results and unbelievably high training time required.
I developed my first word-level language model using the Keras library with my training set as the script for Pulp Fiction. I cleaned the text of all punctuation and converted all the words to lower case. When I start training my model on the given dataset it starts off with an accuracy of 3% and a training time of 6-7 mins per epoch. This is extremely demotivating and I was wondering whether I should tune my hyperparameters or this is normal behaviour for my model and it will yield better results with more epochs?
model=Sequential() model.add(LSTM(256, input_shape=(X.shape, X.shape), return_sequences=True)) model.add(Dropout(0.5)) model.add(LSTM(256)) model.add(Dropout(0.5)) model.add(Dense(y.shape, activation='softmax')) model.compile(optimizer='adam',loss='categorical_crossentropy', metrics= ['accuracy']) model.fit(X,y,batch_size=128,epochs=100)
Starting with the observation that 6-7 minutes is not that much per epoch, you can check for the following reasons:
Are you training on GPU or on CPU? Normally it should run much faster on GPU.
256 units per cell for an LSTM is a quite big dimension. Try reducing to 128 or even 64, check if performance is affected.
- What is the dimension of your dataset? If your dataset is big, then it is normal for the training to take a larger amount of time.
- If you have answer to all the previous, you can try modifying the batch_size; however, be careful as a very large batch size is not recommended (I would suggest going up to 256 but no more).
- Check the integrity of the data. If you know that the data comes from a well-established source, pay attention to the way you feed the data (cleaning, preprocessing) to your neural network. Perhaps the way of feeding the data(dependent variable (y) and/or the X_train) is incorrect.