# How to handle Date field in linear regression

I've the below data and planning to implement Linear Regression out of it.

I've started scripting and came to a stop where it throws me an error because of the Date field (Independent Variable). Can someone help me to modify the code to convert the date field.

``````import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import datetime as dt
%matplotlib inline

dataset = pd.read_excel(r"Data containing Date Field.xlsx")

X = dataset['Date'].values.reshape(-1,1)
y = dataset['Value'].values.reshape(-1,1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

regressor = LinearRegression()
regressor.fit(X_train, y_train)

print("Y intercept is : ", regressor.intercept_)
print("Coefficient or slope is : ", regressor.coef_)

y_pred = regressor.predict(X_train)
``````

Error Message:

``````TypeError: invalid type promotion
``````

Regards,

Bharath Vikas

• answered 2020-09-24 11:30

First; As stated in the comments by @yuRa, you'll need to predict on `X_test` and not `X_train`

Second;

There is several things to think about.

In a linear regression we create a model `Y=x*beta` where `y` is our target (e.g age), `x` is our independent variables (e.g weight) and `beta` a parameter (how much should we increase `Y` when we increase `x` by 1). The `beta` are the ones we find when we "solve the linear regression".

What you have is normally known as a "time series" i.e values that depend on time (roughly speaking). If you want to fit a linear regression right of the bat, you would then need to convert your times to just the numbers [1,2,3,4...] (since they are equally distributed). You would then get a regression with an intercept and one slope (1D).

What you normally would do when having time as a variabale is known as time series analysis. You can fit an ordinary linear regression to that but you then need to think of the following:

1. How many values in the past does the current value depend on?

Lets ignore time series models like ARIMA and say you think the current value depends on the 3 previous days (that is what we call an AR(3) model). You would then need to construct a new data set where each row consists of the value three days in prior e.g

``````x3   x2   x1   value
----------------------
300  301   302   303

301  302   303   304
302  303   304   305
.
.
.
311  312  230.367  269.032
``````

where `x3` is the value three days back, `x2` two days back and `x1` is the value yesterday.

Your regression is then `Y=x0 +beta_1*x1+beta_2*x2+beta_3*x3`