In Seq2Seq Model can I use Bert last hidden state to initial Decoder hidden state

I build a Seq2Seq model, Encoder is a Bert model and output word embedding. Decoder is like a LSTM Language Model input the word embedding from Encoder and output probability distribution each word. When Encoder is a LSTM, We usually take Encoder last output to initial Decoder. But now I don't know how to initial Decoder hidden state. Can I take Bert last output to initial My Decoder? Is it reasonable to do so?

