LSTM multiple features, multiple classes, multiple outputs

I'm trying to use a LSTM classifier to generate music based on some midi's that I have.

The LSTM uses two features, the notes' pitch and the notes' duration.

For illustration, let's think we have:

  • Pitches: ["A", "B", "C"]

  • Durations: ["0.5", "1", "1.5"]

As you can imagine, a generated note has to have both pitch and duration.

I tried to do it with a MultiLabelBinarizer.

from sklearn.preprocessing import MultiLabelBinarizer
labels = [[x,y] for x in all_pitches for y in all_durations]

mlb = MultiLabelBinarizer()
mlb_value = mlb.fit_transform(labels)

This divides the classes as intended, but the problem I'm having comes at the time of predictions.

prediction = model.predict_proba(prediction_input)

indexes = np.argsort(prediction, axis=None)[::-1]
index1 = indexes[0]
index2 = indexes[1]

result1 = mlb.classes_[index1]
result2 = mlb.classes_[index2]

I need the notes to have both pitch and duration, so this approach seems to not work for me (I only get the same two pitches all over).

Another thing I thought was using a MultiOutputClassifier, but I seem unable to understand the differences of them, or how to actually use this MultiOutputClassifier correctly.

Thanks for the patience, and sorry for the probably stupid question.

1 answer

  • answered 2018-07-15 11:58 KonstantinosKokos

    You can feed your LSTM output into many different layers (or neural functions, in general), which lead to different outputs, and then train your model on each of these outputs concurrently:

    from keras.models import Model
    from keras.layers import Input, Dense, LSTM
    
    # function definitions
    lstm_function = LSTM(..args)
    pitch_function = Dense(num_pitches, activation='softmax')
    duration_function = Dense(num_durations, activation='softmax')
    input_features = Input(input_dimensionality)
    
    # function applications
    lstm_output = lstm_function(input_features)
    pitches = pitch_function(lstm_output)
    durations = duration_function(lstm_output)
    
    # model 
    model = Model(inputs=[input_features], outputs=[pitches, durations])
    model.compile(loss=['categorical_crossentropy', 'mse'], optimizer='RMSProp')
    

    This may be generalized to arbitrary information flows, with as many layers/outputs as you need. Remember that for each output you need to define a corresponding loss (or None).