Replace bidirectional LSTM with GRU in coref?

I am training the coarse-to-fine coreference model (for some other language than English) from Allennlp with template configs from bert_lstm.jsonnet. When I replace the type “lstm” of the context layer with “gru”, it works, but seems to have very little impact on training. The same 63 GB of RAM are consumed each epoch, validation f1-score is hovering around the same value. Is this change in config actually replace Bi-LSTM layer with Bi-GRU layer, or am I missing something?

    "context_layer": {
    "type": "gru",
    "bidirectional": true,
    "hidden_size": gru_dim,
    "input_size": bert_dim,
    "num_layers": 1
},
How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum