I'm trying to create a regression with categorical variable.

I start with get all the dummy variables. And drop everything that I don't need in the x value for

```
d1 = pd.get_dummies(df2015 ["CBSA Office"])
df2015_new = pd.concat([df2015, d1], axis=1)
d2 = pd.get_dummies(df2016 ["CBSA Office"])
df2016_new = pd.concat([df2016, d2], axis=1)
trainset = pd.concat([df2015_new,df2016_new],axis=0)
trainset = trainset.dropna()
x_train = trainset.drop(['CBSA Office','Location','Updated','Commercial Flow','Travellers Flow'],axis="columns")
y_train = trainset["Travellers Flow"]
```

Now I'm running the regression using OLS function.

```
x_train = x_train.iloc[:100].values.reshape(-1,1)
y_train = y_train.iloc[:100].values.reshape(-1,1)
modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()
modelx.summary()
```

Then I will get an error message said

```
endog and exog matrices are different sizes
```

But I thought I have already set them the same size

If I don't reshape them, I will get the result like this

```
C:\Users\CiCi\Anaconda3-1\lib\site-packages\statsmodels\regression\linear_model.py:1554: RuntimeWarning: invalid value encountered in double_scalars
return self.ess/self.df_model
C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
return (self.a < x) & (x < self.b)
C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
return (self.a < x) & (x < self.b)
C:\Users\CiCi\Anaconda3-1\lib\site-packages\scipy\stats\_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
cond2 = cond0 & (x <= self.a)
C:\Users\CiCi\Anaconda3-1\lib\site-packages\statsmodels\base\model.py:1100: RuntimeWarning: invalid value encountered in true_divide
return self.params / self.bse
OLS Regression Results
Dep. Variable: Travellers Flow R-squared: 0.000
Model: OLS Adj. R-squared: 0.000
Method: Least Squares F-statistic: nan
Date: Sun, 09 Dec 2018 Prob (F-statistic): nan
Time: 00:34:01 Log-Likelihood: -429.08
No. Observations: 100 AIC: 860.2
Df Residuals: 99 BIC: 862.8
Df Model: 0
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Abbotsford-Huntingdon 8.5000 1.776 4.786 0.000 4.976 12.024
Aldergrove 0 0 nan nan 0 0
Ambassador Bridge 0 0 nan nan 0 0
Blue Water Bridge 0 0 nan nan 0 0
Boundary Bay 0 0 nan nan 0 0
Cornwall 0 0 nan nan 0 0
Coutts 0 0 nan nan 0 0
Douglas (Peace Arch) 0 0 nan nan 0 0
Edmundston 0 0 nan nan 0 0
Emerson 0 0 nan nan 0 0
Fort Frances Bridge 0 0 nan nan 0 0
North Portal 0 0 nan nan 0 0
Pacific Highway 0 0 nan nan 0 0
Peace Bridge 0 0 nan nan 0 0
Prescott 0 0 nan nan 0 0
Queenston-Lewiston Bridge 0 0 nan nan 0 0
Rainbow Bridge 0 0 nan nan 0 0
Sault Ste. Marie 0 0 nan nan 0 0
St-Armand/Philipsburg 0 0 nan nan 0 0
St-Bernard-de-Lacolle 0 0 nan nan 0 0
St. Stephen 0 0 nan nan 0 0
St. Stephen 3rd Bridge 0 0 nan nan 0 0
Stanstead 0 0 nan nan 0 0
Thousand Islands Bridge 0 0 nan nan 0 0
Windsor and Detroit Tunnel 0 0 nan nan 0 0
Woodstock Road 0 0 nan nan 0 0
Omnibus: 81.245 Durbin-Watson: 0.324
Prob(Omnibus): 0.000 Jarque-Bera (JB): 453.220
Skew: 2.832 Prob(JB): 3.84e-99
Kurtosis: 11.757 Cond. No. 1.00e+16
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 9.98e-31. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
```

It is the format that I want, which include all the dummy variables, but it has lots of warning, the R^2 is 0 and for sure I cannot make any predict based on that.

What I want is a summary include every dummy variable

I tried to do this

```
x_train = np.array(x_train).reshape(1,-1)
y_train = np.array(y_train).reshape(1,-1)
modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()
modelx.summary()
```

I will get

```
MemoryError Traceback (most recent call last)
<ipython-input-668-312de7f7e808> in <module>()
1 x_train = np.array(x_train).reshape(1,-1)
2 y_train = np.array(y_train).reshape(1,-1)
----> 3 modelx = sm.OLS(y_train.astype(float), x_train.astype(float)).fit()
4 modelx.summary()
~\Anaconda3-1\lib\site-packages\statsmodels\regression\linear_model.py in fit(self, method, cov_type, cov_kwds, use_t, **kwargs)
273 self.pinv_wexog, singular_values = pinv_extended(self.wexog)
274 self.normalized_cov_params = np.dot(
--> 275 self.pinv_wexog, np.transpose(self.pinv_wexog))
276
277 # Cache these singular values for use later.
MemoryError:
```

I'm new to python, need alot of help, Thanks!