Extremely poor accuracy and high training time per epoch for a word level language model

Setting up my first word-level language model using the Keras API. Extremely poor accuracy results and unbelievably high training time required.

I developed my first word-level language model using the Keras library with my training set as the script for Pulp Fiction. I cleaned the text of all punctuation and converted all the words to lower case. When I start training my model on the given dataset it starts off with an accuracy of 3% and a training time of 6-7 mins per epoch. This is extremely demotivating and I was wondering whether I should tune my hyperparameters or this is normal behaviour for my model and it will yield better results with more epochs?

model=Sequential()

model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), 
return_sequences=True))
model.add(Dropout(0.5))
model.add(LSTM(256))
model.add(Dropout(0.5))
model.add(Dense(y.shape[1], activation='softmax'))

model.compile(optimizer='adam',loss='categorical_crossentropy', metrics= 
['accuracy'])
model.fit(X,y,batch_size=128,epochs=100)

1 answer

  • answered 2019-06-24 07:53 Timbus Calin

    Starting with the observation that 6-7 minutes is not that much per epoch, you can check for the following reasons:

    1. Are you training on GPU or on CPU? Normally it should run much faster on GPU.

    2. 256 units per cell for an LSTM is a quite big dimension. Try reducing to 128 or even 64, check if performance is affected.

    3. What is the dimension of your dataset? If your dataset is big, then it is normal for the training to take a larger amount of time.
    4. If you have answer to all the previous, you can try modifying the batch_size; however, be careful as a very large batch size is not recommended (I would suggest going up to 256 but no more).
    5. Check the integrity of the data. If you know that the data comes from a well-established source, pay attention to the way you feed the data (cleaning, preprocessing) to your neural network. Perhaps the way of feeding the data(dependent variable (y) and/or the X_train) is incorrect.