In Seq2Seq Model can I use Bert last hidden state to initial Decoder hidden state

I build a Seq2Seq model, Encoder is a Bert model and output word embedding. Decoder is like a LSTM Language Model input the word embedding from Encoder and output probability distribution each word. When Encoder is a LSTM, We usually take Encoder last output to initial Decoder. But now I don't know how to initial Decoder hidden state. Can I take Bert last output to initial My Decoder? Is it reasonable to do so?

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum