Preparing input data for LSTM layer with conditions

I have a data frame that looks like the one below:

DF.head(20):
time        var1       var2       prob     
12:30       10          12         85
12:31       15          45         85
12:32       18          12         85
12:33       17          26         85
12:34       11          14         85
12:35       14          65         85
12:36       19          29         92
12:37       15          32         92
12:38       13          44         92
12:39       15          33         92
12:40       11          15         92
12:41       15          45         92
12:42       13          44         94
12:43       15          33         94
12:44       11          15         94
12:45       15          45         94
12:46       13          44         92
12:47       15          33         92
12:48       11          15         92
12:49       15          45         92

I want to predict the value of prob for a sequence of 6 previous values. So for the given example, I will take two-time series -> var1 and var2 from time 12:30 to 12:35 to predict prob for 12:35. the input shape that will go to LSTM as per my knowledge will be (df. shape[0],6,1). but I do not know how to convert my input from 2 dimensions to 3 dimensions. I also have a condition where I need to see the previous 6 times only if they are all under the same prob value. so in the given example, I won't be able to take the previous 6 values for prob = 94 as 94 occurs only 4 times and I cannot make 6 timesteps from that.

My pseudo code looks like this:

for i in range(df.shape[0]):        #loop across all rows
  if final_df[i,'prob'] == final_df[i+1,'prob']:     #go until the value of prob change
      make multiple non overlaping dataframes of shape (6,2)
  else:
      continue

I need help building the logic and preparing the input data for my LSTM.

1 answer

  • answered 2022-05-07 03:35 keramat

    Your question is not completely clear but the input to the LSTM should be in form:

    [samples, timesteps, features]
    

    For example:

    inputs = tf.random.normal([32, 10, 8])
    

    So in your case, each sample will have shape (6,2). You can use rolling or simple for to make the data. Example:

    df = pd.DataFrame({'var1': np.arange(10), 'var2': np.arange(10), 'prob': np.random.randint(0,10,10)})
    xs = []
    ys = []
    for i in range(6,10):
        xs.append(df[i-6:i][['var1', 'var2']].values)
        ys.append(df.iloc[i]['prob'])
        
    data = np.array(xs).reshape(-1,6,2)
    
    data.shape
    

    Output:

    (4, 6, 2)
    

    Based on the comment:

    for i in range(6,20,6):
    ...
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum