Shifting data for creating lag features in time series modeling

I have a pandas dataframe in the below format. I am trying to convert the time series problem into a regression problem, hence in the process of creating lag features. But the "Price" is dependent not only on month and year, but also on columns make, AA, VO & value.

For creating lag features, I sorted the dataframe based on columns make, AA, VO, month, year and value. After that, I am not sure on how to create lag features as the price is dependent on several variables. I am pretty new to time series modeling and I understand this is an open ended question. Any suggestions would be greatly appreciated.

make    AA  VO  month year  value   Price
ACURA   No  Yes 1     2016  8      4271.41
ACURA   No  No  1     2018  8      1769.92
ACURA   No  No  1     2019  14     4363.9
ACURA   No  Yes 2     2018  2      671.84
ACURA   No  No  2     2016  29     3551.07
ACURA   No  Yes 5     2018  14     5044.95
ACURA   No  Yes 6     2016  11     4049.2
ACURA   No  Yes 7     2018  0      1466.29
ACURA   No  Yes 7     2019  0      4118.45
ACURA   No  Yes 12    2016  1      1062.03
ACURA   No  No  12    2018  23     7361.5

1 answer

  • answered 2020-09-28 03:25 David Erickson

    I think this is an example of what you are trying to do. I would include more columns in the .groupby, but it would lead to all NaN values, since the number of rows within each group of all the required columns is only one, so there is nothing to shift on; thus, it will return all NaN values. However, if you groupby for example, ['make', 'month', 'value'] then this would return a few values, just so you can visualize what is going on:

    import pandas as pd
    df['Price Lag'] = df.groupby(['make', 'month', 'value'])['Price'].transform(lambda x: x.shift())
    df
    
    Out[1]: 
         make  AA   VO  month  year  value    Price  Price Lag
    0   ACURA  No  Yes      1  2016      8  4271.41        NaN
    1   ACURA  No   No      1  2018      8  1769.92    4271.41
    2   ACURA  No   No      1  2019     14  4363.90        NaN
    3   ACURA  No  Yes      2  2018      2   671.84        NaN
    4   ACURA  No   No      2  2016     29  3551.07        NaN
    5   ACURA  No  Yes      5  2018     14  5044.95        NaN
    6   ACURA  No  Yes      6  2016     11  4049.20        NaN
    7   ACURA  No  Yes      7  2018      0  1466.29        NaN
    8   ACURA  No  Yes      7  2019      0  4118.45    1466.29
    9   ACURA  No  Yes     12  2016      1  1062.03        NaN
    10  ACURA  No   No     12  2018     23  7361.50        NaN