How to forecast a time series out-of-sample using an ARIMA model in Python?

I have seen similar questions at Stackoverflow. But, either the questions were different enough or if similar, they actually have not been answered. I gather it is something that modelers run into often, and have a challenge solving.

In my case I am using two variables, one Y and one X with 50 time series sequential observations. They are both random numbers representing % changes (they could be anything you want, their true value does not matter. This is just to set up an example of my coding problem). Here are my basic codes to build this ARIMAX(1,0,0) model.

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

df = pd.read_excel('/Users/gaetanlion/Google Drive/Python/Arima/df.xlsx', sheet_name = 'final')

from statsmodels.tsa.arima_model import ARIMA
endo = df['y']
exo = df['x']

Next, I build the ARIMA model, using the first 41 observations

modelho = sm.tsa.arima.ARIMA(endo.loc[0:40], exo.loc[0:40], order =(1,0,0)).fit()
print(modelho.summary())

So far everything works just fine.

Next, I attempt to forecast or predict the next 9 observations out-of-sample. Here I want to use the X values over these 9 observations to predict Y. And, I just can't do it. I am showing below just the one code, that I think gets me the closest to where I need to go.

modelho.predict(exo.loc[41:49], start = 41, end = 49, dynamic = False)
TypeError: predict() got multiple values for argument 'start'

1 answer

  • answered 2021-04-15 10:09 Econ_matrix

    This example should work. I am using your code but it is slightly changed.

    import pandas as pd
    import statsmodels.api as sm
    import numpy as np
    from statsmodels.tsa.arima_model import ARIMA
    

    generate an example data frame

    df = pd.DataFrame(data = {'x' : np.random.normal(12, 3,size = 332),
                         'y' : np.random.normal(12, 2,size = 332)})
    
    df
    
    endo = df['y']
    exo = df['x']
    

    The order of Arima model kept as in your code, it is just for demonstration

    Lets leave 12 observations out of the model

    modelho = sm.tsa.arima.ARIMA(endo[:-12], exo[:-12], order =(1,0,0)).fit()
    modelho.summary()
    
    exo[-12:]
    
    modelho.predict(exog = exo[-12:], start = 320, end = 331)