LSTM Seq2Seq model accuracy stuck at ~80%

I am training an LSTM Seq2Seq model to solve an address parsing problem. The problem entails taking an initial address string (such as "184 Park Ave, NYC, NY") and breaking it into its individual components: {number: 184, street: Park Ave, city: NYC, region: NY}.

I am using an LSTM model to take in character embedded sequences (length=80) of the initial address string and output a sequence of individual address components (length=153). My vocabulary size is 41. Each character, number, or special character is mapped to a numerical value between 0 and 41 including spaces.

Below I have attached the code to retrieve the training and testing datasets:

test1 = list()
train2 = list()
test2 = list()

X_train, X_test, y_train, y_test = np.load("X_train.npy"), np.load("X_test.npy"), np.load("y_train.npy"), np.load("y_test.npy"),  

with tensorflow.device('/device:GPU:0'):
  for seq1, seq2, seq3, seq4 in zip(X_train, y_train, X_test, y_test): 
    train1.append(to_categorical(seq1, num_classes=40+1))
    test1.append(to_categorical(seq3, num_classes=40+1))
    train2.append(to_categorical(seq2, num_classes=40+1))
    test2.append(to_categorical(seq4, num_classes=40+1))

train1, test1, train2, test2 = np.array(train1), np.array(test1), np.array(train2), np.array(test2)

print(train1.shape, test1.shape, train2.shape, test2.shape)

-> (100000, 80, 41) (100000, 80, 41) (100000, 153, 41) (100000, 153, 41)

When training the model on these sequences, I reach a training accuracy of ~80% after about 10,000 examples and the model suddenly stop improving. Below is the code for my model:

model = Sequential()
model.add(LSTM(100, input_shape=(n_in_seq_length, 41)))
model.add(LSTM(100, return_sequences=True))
model.add(TimeDistributed(Dense(41, activation='softmax')))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])


with tensorflow.device('/device:GPU:0'):, train2, epochs=1, batch_size=n_batch)

I have tried GRUs and Vanilla RNNs along with various model architectures and number of hidden layers but nothing seems to improve performance. My goal is to train the model to around 95% accuracy. How do I improve my model's training accuracy?

Any feedback would be much appreciated. Thank you!

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum