why is my Neural Network stuck at high loss value after the first epochs
I'm doing regression using Neural Networks. It should be a simple task for NN to do, I have 10 features and 1 output that I want to predict.I’m using pytorch for my project but my Model is not learning well. the loss start with a very high value (40000), then after the first 510 epochs the loss decrease rapidly to 60007000 and then it stuck there, no matter what I make. I tried even to change to skorch instead of pytorch so that I can use cross validation functionality but that also didn’t help. I tried different optimizers and added layers and neurons to the network but that didn’t help, it stuck at 6000 which is a very high loss value. I’m doing regression here, I have 10 features and I’m trying to predict one continuous value. that should be easy to do that’s why it is confusing me more.
here is my network: I tried here all the possibilities from making more complex architectures like adding layers and units to batch normalization, changing activations etc.. nothing have worked
class BearingNetwork(nn.Module):
def __init__(self, n_features=X.shape[1], n_out=1):
super().__init__()
self.model = nn.Sequential(
nn.Linear(n_features, 512),
nn.BatchNorm1d(512),
nn.LeakyReLU(),
nn.Linear(512, 64),
nn.BatchNorm1d(64),
nn.LeakyReLU(),
nn.Linear(64, n_out),
# nn.LeakyReLU(),
# nn.Linear(256, 128),
# nn.LeakyReLU(),
# nn.Linear(128, 64),
# nn.LeakyReLU(),
# nn.Linear(64, n_out)
)
def forward(self, x):
out = self.model(x)
return out
and here are my settings: using skorch is easier than pytorch. here I'm monitoring also the R2 metric and I made RMSE as a custom metric to also monitor the performance of my model. I also tried the amsgrad for Adam but that didn't help.
R2 = EpochScoring(r2_score, lower_is_better=False, name='R2')
explained_var_score = EpochScoring(EVS, lower_is_better=False, name='EVS Metric')
custom_score = make_scorer(RMSE)
rmse = EpochScoring(custom_score, lower_is_better=True, name='rmse')
bearing_nn = NeuralNetRegressor(
BearingNetwork,
criterion=nn.MSELoss,
optimizer=optim.Adam,
optimizer__amsgrad=True,
max_epochs=5000,
batch_size=128,
lr=0.001,
train_split=skorch.dataset.CVSplit(10),
callbacks=[R2, explained_var_score, rmse, Checkpoint(), EarlyStopping(patience=100)],
device=device
)
I also standardize the Input values.
my Input have the shape:
torch.Size([39006, 10])
and shape of output is:
torch.Size([39006, 1])
I’m using 128 as my Batch_size but I also tried other values like 32, 64, 512 and even 1024. Although normalizing output is not necessary but I also tried that and It didn’t work when I predict values, the loss is high. Please someone help me on this, I would appreciate every helpful advice. I ll also add a screenshot of my training and val losses and metrics over epochs to visualize how the loss is decreasing in the first 5 epochs and then it stays like forever at the value 6000 which is a very high value for a loss.
1 answer

considering that your training and dev loss are decreasing over time, it seems like your model is training correctly. With respect to your worry regarding your training and dev loss values, this is entirely dependent on the scale of your target values (how big are your target values?) and the metric used to compute the training and dev losses. If your target values are big and you want smaller train and dev loss values, you can normalise the target values.
From what I gather with respect to your experiments as well as your R2 scores, it seems that you are looking for a solution in the wrong area. To me, it seems like your features aren't strong enough considering that your R2 scores are low, which could mean that you have a data quality issue. This would also explain why your architecture tuning has not improved your model's performance as it is not your model that is the issue. So if I were you, I would think about what new useful features I could add and see if that helps. In machine learning, the general rule is that models are only as good as the data that they are trained on. I hope this helps!
See also questions close to this topic

How can I predict next step? from matlab online Time Series Forecasting example tutorial
I am studying the example "Time Series Forecasting Using Deep Learning" tutorial provided by Matlab below https://uk.mathworks.com/help/deeplearning/ug/timeseriesforecastingusingdeeplearning.html
Right at the bottom of the example it refers to "Update Network State with Observed Values" in this section, "you can update the network state with observed values instead of predicted values. Then Predict on each time step. For each prediction, predict the next time step using the observed value of the previous time step."
But what if I wanted to predict the next month? or tomorrow? (assume the time series was a daily time series). How do I do this?
Like the example shows, for each observed value, it is predicts the next time step. Or am I confused and this example is already doing what I am asking for? I want to predict t+1 right at the end of the chart (predicted line)
I have attached an image below to demonstrate what I am asking for.
I have attached an image below to demonstrate what I am asking for.

How keras.model.fit_generator iterate over a validation_data?
I'm using Keras package for a neural networks. Now I'm building NN that would be uses for forecast problems. I train my model on google colabolatory, where RAM is limited to 12 GB.
To resolve problems with RAM I'm usingmodel.fit_generator
functionality and here my problem starting. Because one image is more than 1 thousand words I created this diagram:Details:
 data
df = pd.DataFrame({'x': [i for i in range(50)], 'y': [i**2 for i in range(50)]})
 constants
num_rows = len(df) # 50 validate_size = 1 train_size = 4 batch_size = validate_size + train_size # 5 steps_per_epoch = 1 epochs = num_rows // batch_size // steps_per_epoch # 10
 generators
def validate_generator(df, validate_size, train_size): batch_size = validate_size + train_size for i in range(0, len(df), batch_size): X = df[i + train_size:i + batch_size, 0] y = df[i + train_size:i + batch_size, 1] yield (X.reshape(1, 1), y.reshape(1, 1)) print('\n\nval', X, '\n') def train_generator(df, validate_size, train_size): batch_size = validate_size + train_size for i in range(0, len(df), batch_size): X = df[i:i + train_size, 0] y = df[i:i + train_size, 1] yield (X.reshape(1, 1), y.reshape(1, 1)) print('\n\ntrain', X)
 keras model.fit_generator function
model.fit_generator( train_generator(df, validate_size, train_size), steps_per_epoch=steps_per_epoch, validation_data=validate_generator(df, validate_size, train_size), validation_steps=steps_per_epoch, epochs=epochs, )
Training process logs:
Epoch 1/10 train [0 1 2 3] 1/1 [==============================]  ETA: 0s  loss: 17.7931  mae: 2.8934 val [4] 1/1 [==============================]  0s 105ms/step  loss: 17.7931  mae: 2.8934  val_loss: 205.8619  val_mae: 14.3479 Epoch 2/10 train [5 6 7 8] 1/1 [==============================]  ETA: 0s  loss: 1865.1838  mae: 40.8180 val [9] val [14] 1/1 [==============================]  0s 37ms/step  loss: 1865.1838  mae: 40.8180  val_loss: 36154.7812  val_mae: 190.1441 Epoch 3/10 train [10 11 12 13] 1/1 [==============================]  ETA: 0s  loss: 17199.1582  mae: 128.6884 val [19] val [24] 1/1 [==============================]  0s 32ms/step  loss: 17199.1582  mae: 128.6884  val_loss: 320162.5000  val_mae: 565.8290 Epoch 4/10 train [15 16 17 18] 1/1 [==============================]  ETA: 0s  loss: 72351.9141  mae: 266.5040 val [29] val [34] 1/1 [==============================]  0s 40ms/step  loss: 72351.9141  mae: 266.5040  val_loss: 1302790.6250  val_mae: 1141.3986 Epoch 5/10 train [20 21 22 23] 1/1 [==============================]  ETA: 0s  loss: 208619.7188  mae: 454.2613 val [39] val [44] 1/1 [==============================]  0s 36ms/step  loss: 208619.7188  mae: 454.2613  val_loss: 3674302.5000  val_mae: 1916.8470 Epoch 6/10 train [25 26 27 28] 1/1 [==============================]  ETA: 0s  loss: 482259.9688  mae: 691.9575 val [49] WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 1 batches). You may need to use the repeat() function when building your dataset. 1/1 [==============================]  0s 44ms/step  loss: 482259.9688  mae: 691.9575 Epoch 7/10 train [30 31 32 33] 1/1 [==============================]  0s 18ms/step  loss: 964487.8750  mae: 979.5891 Epoch 8/10 train [35 36 37 38] 1/1 [==============================]  0s 25ms/step  loss: 1741474.2500  mae: 1317.1534 Epoch 9/10 train [40 41 42 43] 1/1 [==============================]  0s 16ms/step  loss: 2914341.0000  mae: 1704.6472 Epoch 10/10 train [45 46 47 48] 1/1 [==============================]  0s 16ms/step  loss: 4599159.5000  mae: 2142.0679
I have a problem with
validate_generator
. You can see on training logs that after 6 epoch there is a warning aboutvalidation_data
 there is no more data to iterate over. I think this warning shows because in prior epochsvalidate_generator
yields data double per epoch.
Link to colab: https://colab.research.google.com/drive/1INO5UfPDr_7Jvx6xM4FVBYkGnLG6IFUf?usp=sharing 
Is it correct backpropogation for NN
I have 3 layer NN. 3 neurons in input, then 3 with RELU activation, then 3 with Sigmoig activation, then 3 with Softmax activation(its output). And there are weights:
W = np.array([[[0.1, 0.2, 0.3], [0.3, 0.2, 0.7], [0.1, 0.2, 0.3]], [[0.2, 0.3, 0.5], [0.3, 0.5, 0.7], [0.6, 0.4, 0.8]], [[0.1, 0.4, 0.8], [0.3, 0.7, 0.2], [0.5, 0.2, 0.9]]])
Input and result:
X = np.array([0.1, 0.2, 0.7]) Y = np.array([1.0, 0.0, 0.0])
Feed forward:
Z1 = W[0] @ X H1 = RELU(Z1) Z2 = W[1] @ H1 H2 = Sigmoid(Z2) Z3 = W[2] @ H2 H3 = Softmax(Z3) Loss = CrossEntropy(Ypred,Y)
Back propagation:
dLoss = dCrossEntropy(Ypred, Y) o_error = dLoss o_delta = dLoss*(dSoftmax(o_error)) z2_error = o_delta*(W[2].T) z2_delta = z2_error * dSigmoid(Z2) z1_error = z2_error*(W[1].T) z1_delta = z1_error * dRELU(Z1) W[0] += X.T.dot(z1_delta) W[1] += (Z1.T).dot(z1_delta) W[2] += (Z2.T).dot(o_delta)
Am I correct in backpropagation?

how to do image rotation for 3D MRA image
I want to rotate the tilted mra image. I downloaded IXI dataset but they had some data like the right one as you can see. I want to make it in right direction like the image which an arrow headed to. Is image registration needed for this task? then, let me know how to do it...thank you...

What is the best Software Development Methodology for Deep Learning?
Like the common software development methodologies such as waterfall and agile, what is the best methodology for deep learning for face or body recognition? Or is there any other better approach?

Goodness of Fit statistic Tobit model
I have estimated a Tobit model using the censReg package, along with the censReg function. Alternatively, the same Tobit model is estimated using the tobit function in the AER package.
Now, I really like to have some goodness of fit statistic, such as the PseudoR2. However, whenever I try to estimate this, the output returns as NA. For example:
Tobit < censReg(Listing$occupancy_rate ~ ., left = Inf, right = 1, data = Listing) PseudoR2(Tobit, which = "McFadden") [1] NA
So far, I have only seen reported PseudoR2's when people use Stata. Does anyone know how to estimate it in R?
Alternatively, Tobit estimates the (log)Sigma, which is basically the standard deviation of the residuals. Could I use this to calculate the R2?
All help is really appreciated.

How do I make a year index that statsmodels vector Autoregression can recognize?
I am struggling to make that statsmodels.tsa.api.VAR recognize my index as an annual frequency
I have a data frame, that is a panel, with a country (panel dimension) and a year variable (time dimension)
df = df.set_index([‘country’, ‘year’])
Then I iteratively estimate the stuff I need to estimate for each country with this code
for cont in countries: exog = df[exogVars].xs(cont).dropna() # We estimate first a SVAR with GDP p.c. endog = df[endogVars].xs(cont).dropna() svar_st = VAR(endog, exog) svar_f = svar_st.fit(maxlags = 4, ic = 'aic', trend = 'nc')
And I get the work done, but not without a warning message that indicates:
ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting
Despite I may not be interested in forecasting, I am bugged about why this is happening. Can anybody help?
Thanks

How to solve ValueError: y should be a 1d array, got an array of shape (3, 5) instead. for naive Bayes?
from sklearn.model_selection import train_test_split X = data.drop('Vickers Hardness\n(HV0.5)', axis=1) y = data['Vickers Hardness\n(HV0.5)'] X_train, y_train, X_test, y_test = train_test_split(X, y, test_size = 0.3) from sklearn.naive_bayes import GaussianNB gnb = GaussianNB() gnb.fit(X_train, y_train) y_pred = gnb.predict(X_test)
ValueError: y should be a 1d array, got an array of shape (3, 5) instead.
Used data:
How to rectify this error in naive bayes? how can I put y in 1D array?

Debugging pytorch code in pycharm (Feasibility)
I am trying to run a code in written in python (pytorch code) which when passed as an arguments options trains the Neural network.
if __name__ == "__main__": args = docopt(__doc__) myparams = args["options"] .... /* do work */
Now if we have to run this code, I need to call it from console.
python3 train.py option1 123
etc. But in that case the debug points won't work in pycharm. Can anybody clarify how to debug in this scenario? (If you know the way it would be great if you let me know). 
Training results are different for Classification using Pytorch APIs and Fastai
I have two training python scripts. One using Pytorch's API for classification training and another one is using Fastai. Using Fastai has much better results.
Training outcomes are as follows.
Fastai epoch train_loss valid_loss accuracy time 0 0.205338 2.318084 0.466482 23:02 1 0.182328 0.041315 0.993334 22:51 2 0.112462 0.064061 0.988932 22:47 3 0.052034 0.044727 0.986920 22:45 4 0.178388 0.081247 0.980883 22:45 5 0.009298 0.011817 0.996730 22:44 6 0.004008 0.003211 0.999748 22:43 Using Pytorch Epoch [1/10], train_loss : 31.0000 , val_loss : 1.6594, accuracy: 0.3568 Epoch [2/10], train_loss : 7.0000 , val_loss : 1.7065, accuracy: 0.3723 Epoch [3/10], train_loss : 4.0000 , val_loss : 1.6878, accuracy: 0.3889 Epoch [4/10], train_loss : 3.0000 , val_loss : 1.7054, accuracy: 0.4066 Epoch [5/10], train_loss : 2.0000 , val_loss : 1.7154, accuracy: 0.4106 Epoch [6/10], train_loss : 2.0000 , val_loss : 1.7232, accuracy: 0.4144 Epoch [7/10], train_loss : 2.0000 , val_loss : 1.7125, accuracy: 0.4295 Epoch [8/10], train_loss : 1.0000 , val_loss : 1.7372, accuracy: 0.4343 Epoch [9/10], train_loss : 1.0000 , val_loss : 1.6871, accuracy: 0.4441 Epoch [10/10], train_loss : 1.0000 , val_loss : 1.7384, accuracy: 0.4552
Using Pytorch is not converging. I used the same network (Wideresnet22) and both are trained from scratch without pretrained model.
The network is here.
Training using Pytorch is here.
Using Fastai is as follows.
from fastai.basic_data import DataBunch from fastai.train import Learner from fastai.metrics import accuracy #DataBunch takes data and internall create data loader data = DataBunch.create(train_ds, valid_ds, bs=batch_size, path='./data') #Learner uses Adam as default for learning learner = Learner(data, model, loss_func=F.cross_entropy, metrics=[accuracy]) #Gradient is clipped learner.clip = 0.1 #learner finds its learning rate learner.lr_find() learner.recorder.plot() #Weight decay helps to lower down weight. Learn in https://towardsdatascience.com/ learner.fit_one_cycle(5, 5e3, wd=1e4)
What could be wrong in my training algorithm using Pytorch?

How to implement early stopping when a neural network attains a certain validation accuracy with Pytorch?
I'd like to stop the training of my model when I hit a given validation accuracy similarly to this question here but instead using PyTorch. Is this possible and if so how is it done?

Is NN just Bad at Solving this Simple Linear Problem, or is it because of Bad Training?
I was trying to train a very straightforward (I thought) NN model with PyTorch and skorch, but the bad performance really baffles me, so it would be great if you have any insight into this.
The problem is something like this: there are five objects, A, B, C, D, E, (labeled by their fingerprint, e.g.(0, 0) is A, (0.2, 0.5) is B, etc) each correspond to a number, and the problem is trying to find what number does each correspond to. The training data is a list of "collections" and the corresponding sum. for example: [A, A, A, B, B] == [(0,0), (0,0), (0,0), (0.2,0.5), (0.2, 0.5)] > 15, [B, C, D, E] == [(0.2,0.5), (0.5,0.8), (0.3,0.9), (1,1)] > 30 .... Note that number of object in one collection is not constant
There is no noise or anything, so it's just a linear system that can be solved directly. So I would thought this would be very easy for a NN for find out. I'm actually using this example as a sanity check for a more complicated problem, but was surprised that NN couldn't even solve this.
Now I'm just trying to pinpoint exactly where it went wrong. The model definition seem to be right, the data input is right, is the bad performance due to bad training? or is NN just bad at these things?
here is the model definition:
class NN(nn.Module): def __init__( self, input_dim, num_nodes, num_layers, batchnorm=False, activation=Tanh, ): super(SingleNN, self).__init__() self.get_forces = get_forces self.activation_fn = activation self.model = MLP( n_input_nodes=input_dim, n_layers=num_layers, n_hidden_size=num_nodes, activation=activation, batchnorm=batchnorm, ) def forward(self, batch): if isinstance(batch, list): batch = batch[0] with torch.enable_grad(): fingerprints = batch.fingerprint.float() fingerprints.requires_grad = True #index of the current "collection" in the training list idx = batch.idx sorted_idx = torch.unique_consecutive(idx) o = self.model(fingerprints) total = scatter(o, idx, dim=0)[sorted_idx] return total @property def num_params(self): return sum(p.numel() for p in self.parameters()) class MLP(nn.Module): def __init__( self, n_input_nodes, n_layers, n_hidden_size, activation, batchnorm, n_output_nodes=1, ): super(MLP, self).__init__() if isinstance(n_hidden_size, int): n_hidden_size = [n_hidden_size] * (n_layers) self.n_neurons = [n_input_nodes] + n_hidden_size + [n_output_nodes] self.activation = activation layers = [] for _ in range(n_layers  1): layers.append(nn.Linear(self.n_neurons[_], self.n_neurons[_ + 1])) layers.append(activation()) if batchnorm: layers.append(nn.BatchNorm1d(self.n_neurons[_ + 1])) layers.append(nn.Linear(self.n_neurons[2], self.n_neurons[1])) self.model_net = nn.Sequential(*layers) def forward(self, inputs): return self.model_net(inputs)
and the skorch part is straightforward
model = NN(2, 100, 2) net = NeuralNetRegressor( module=model, ... ) net.fit(train_dataset, None)
For a test run, the dataset looks like the following (16 collections in total):
[[0.7484336 0.5656401] [0. 0. ] [0. 0. ] [0. 0. ]] [[1. 1.] [0. 0.] [0. 0.]] [[0.51311415 0.67012525] [0.51311415 0.67012525] [0. 0. ] [0. 0. ]] [[0.51311415 0.67012525] [0.7484336 0.5656401 ] [0. 0. ]] [[0.51311415 0.67012525] [1. 1. ] [0. 0. ] [0. 0. ]] [[0.51311415 0.67012525] [0.51311415 0.67012525] [0. 0. ] [0. 0. ] [0. 0. ] [0. 0. ] [0. 0. ] [0. 0. ]] [[0.51311415 0.67012525] [1. 1. ] [0. 0. ] [0. 0. ] [0. 0. ] [0. 0. ]] ....
with corresponding total: [10, 11, 14, 14, 17, 18, ...]
It's easy to tell what are the objects/how many of them are in one collection just by eyeballing it and the training process looks like:
epoch train_energy_mae train_loss cp dur      1 4.9852 0.5425 + 0.1486 2 16.3659 4.2273 0.0382 3 6.6945 0.7403 0.0025 4 7.9199 1.2694 0.0024 5 12.0389 2.4982 0.0024 6 9.9942 1.8391 0.0024 7 5.6733 0.7528 0.0024 8 5.7007 0.5166 0.0024 9 7.8929 1.0641 0.0024 10 9.2560 1.4663 0.0024 11 8.5545 1.2562 0.0024 12 6.7690 0.7589 0.0024 13 5.3769 0.4806 0.0024 14 5.1117 0.6009 0.0024 15 6.2685 0.8831 0.0024 .... 290 5.1899 0.4750 0.0024 291 5.1899 0.4750 0.0024 292 5.1899 0.4750 0.0024 293 5.1899 0.4750 0.0024 294 5.1899 0.4750 0.0025 295 5.1899 0.4750 0.0025 296 5.1899 0.4750 0.0025 297 5.1899 0.4750 0.0025 298 5.1899 0.4750 0.0025 299 5.1899 0.4750 0.0025 300 5.1899 0.4750 0.0025 301 5.1899 0.4750 0.0024 302 5.1899 0.4750 0.0025 303 5.1899 0.4750 0.0024 304 5.1899 0.4750 0.0024 305 5.1899 0.4750 0.0025 306 5.1899 0.4750 0.0024 307 5.1899 0.4750 0.0025
You can see that it just stopped training after a while. I can confirm that the NN does give different result for different fingerprint, but somehow the final predicted value is just never good enough.
I have tried different NN size, learning rate, batch size, activation function (tanh, relu, etc) and non of them seem to help. Do you have any insight into this? is there anything I did wrong/could try, or is NN just bad at this kind of task?

How to use PyTorch’s DataLoader together with skorch’s GridSearchCV
I am running a PyTorch ANN model (for a classification task) and I am using skorch’s
GridSearchCV
to search for the optimal hyperparameters.When I ran
GridSearchCV
usingn_jobs=1
(ie. doing one hyperparameter combination at a time), it runs really slowly.When I set
n_jobs
to greater than 1, I get a memory blowout error. So I am now trying to see if I could use PyTorch’sDataLoader
to split up the dataset into batches to avoid the memory blowout issue. According to this other PyTorch Forum question (https://discuss.pytorch.org/t/howtouseskorchfordatathatdoesnotfitintomemory/70081/2), it appears we could use SliceDataset. My code for this is as below:# Setting up artifical neural net model class TabularModel(nn.Module): # Initialize parameters embeds, emb_drop, bn_cont and layers def __init__(self, emb_szs, n_cont, out_sz, layers, p=0.5): super().__init__() self.embeds = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in emb_szs]) self.emb_drop = nn.Dropout(p) self.bn_cont = nn.BatchNorm1d(n_cont) # Create empty list for each layer in the neural net layerlist = [] # Number of all embedded columns for categorical features n_emb = sum((nf for ni, nf in emb_szs)) # Number of inputs for each layer n_in = n_emb + n_cont for i in layers: # Set the linear function for the weights and biases, wX + b layerlist.append(nn.Linear(n_in, i)) # Using ReLu activation function layerlist.append(nn.ReLU(inplace=True)) # Normalised all the activation function output values layerlist.append(nn.BatchNorm1d(i)) # Set some of the normalised activation function output values to zero layerlist.append(nn.Dropout(p)) # Reassign number of inputs for the next layer n_in = i # Append last layer layerlist.append(nn.Linear(layers[1], out_sz)) # Create sequential layers self.layers = nn.Sequential(*layerlist) # Function for feedforward def forward(self, x_cat_cont): x_cat = x_cat_cont[:,0:cat_train.shape[1]].type(torch.int64) x_cont = x_cat_cont[:,cat_train.shape[1]:].type(torch.float32) # Create empty list for embedded categorical features embeddings = [] # Embed categorical features for i, e in enumerate(self.embeds): embeddings.append(e(x_cat[:,i])) # Concatenate embedded categorical features x = torch.cat(embeddings, 1) # Apply dropout rates to categorical features x = self.emb_drop(x) # Batch normalize continuous features x_cont = self.bn_cont(x_cont) # Concatenate categorical and continuous features x = torch.cat([x, x_cont], 1) # Feed categorical and continuous features into neural net layers x = self.layers(x) return x # Use cross entropy loss function since this is a classification problem # Assign class weights to the loss function criterion_skorch = nn.CrossEntropyLoss # Use Adam solver with learning rate 0.001 optimizer_skorch = torch.optim.Adam from skorch import NeuralNetClassifier # Random seed chosen to ensure results are reproducible by using the same initial random weights and biases, # and applying dropout rates to the same random embedded categorical features and neurons in the hidden layers torch.manual_seed(0) net = NeuralNetClassifier(module=TabularModel, module__emb_szs=emb_szs, module__n_cont=con_train.shape[1], module__out_sz=2, module__layers=[30], module__p=0.0, criterion=criterion_skorch, criterion__weight=cls_wgt, optimizer=optimizer_skorch, optimizer__lr=0.001, max_epochs=150, device='cuda' ) from sklearn.model_selection import GridSearchCV param_grid = {'module__layers': [[30], [50,20]], 'module__p': [0.0], 'max_epochs': [150, 175] } from torch.utils.data import TensorDataset, DataLoader from skorch.helper import SliceDataset # cat_con_train and y_train is a PyTorch tensor tsr_ds = TensorDataset(cat_con_train.cpu(), y_train.cpu()) torch.manual_seed(0) # Set random seed for shuffling results to be reproducible d_loader = DataLoader(tsr_ds, batch_size=100000, shuffle=True) d_loader_slice_X = SliceDataset(d_loader, idx=0) d_loader_slice_y = SliceDataset(d_loader, idx=1) models = GridSearchCV(net, param_grid, scoring='roc_auc', n_jobs=2).fit(d_loader_slice_X, d_loader_slice_y)
However, when I ran this code, I get the following error message:
 TypeError Traceback (most recent call last) <ipythoninput47df3fc792ad5e> in <module>() 104 > 105 models = GridSearchCV(net, param_grid, scoring='roc_auc', n_jobs=2).fit(d_loader_slice_X, d_loader_slice_y) 106 6 frames /usr/local/lib/python3.6/distpackages/skorch/helper.py in __getitem__(self, i) 230 def __getitem__(self, i): 231 if isinstance(i, (int, np.integer)): > 232 Xn = self.dataset[self.indices_[i]] 233 Xi = self._select_item(Xn) 234 return self.transform(Xi) TypeError: 'DataLoader' object does not support indexing
How do I fix this? Is there a way to use PyTorch’s
DataLoader
together with skorch’sGridSearchCV
(ie. is there a way to load data in batches into skorch’sGridSearchCV
, to avoid memory blowout issues when I setn_jobs
to greater than 1 inGridSearchCV
)?Many many thanks in advance!

PyTorch & skorch: How to fix my nn.Module to work with skorch's GridSearchCV
Using PyTorch, I have an ANN model (for a classification task) below:
import torch import torch.nn as nn # Setting up artifical neural net model which separates out categorical # from continuous features, so that embedding could be applied to # categorical features class TabularModel(nn.Module): # Initialize parameters embeds, emb_drop, bn_cont and layers def __init__(self, emb_szs, n_cont, out_sz, layers, p=0.5): super().__init__() self.embeds = nn.ModuleList([nn.Embedding(ni, nf) for ni, nf in emb_szs]) self.emb_drop = nn.Dropout(p) self.bn_cont = nn.BatchNorm1d(n_cont) # Create empty list for each layer in the neural net layerlist = [] # Number of all embedded columns for categorical features n_emb = sum((nf for ni, nf in emb_szs)) # Number of inputs for each layer n_in = n_emb + n_cont for i in layers: # Set the linear function for the weights and biases, wX + b layerlist.append(nn.Linear(n_in, i)) # Using ReLu activation function layerlist.append(nn.ReLU(inplace=True)) # Normalised all the activation function output values layerlist.append(nn.BatchNorm1d(i)) # Set some of the normalised activation function output values to zero layerlist.append(nn.Dropout(p)) # Reassign number of inputs for the next layer n_in = i # Append last layer layerlist.append(nn.Linear(layers[1], out_sz)) # Create sequential layers self.layers = nn.Sequential(*layerlist) # Function for feedforward def forward(self, x_cat_cont): x_cat = x_cat_cont[:,0:cat_train.shape[1]].type(torch.int64) x_cont = x_cat_cont[:,cat_train.shape[1]:].type(torch.float32) # Create empty list for embedded categorical features embeddings = [] # Embed categorical features for i, e in enumerate(self.embeds): embeddings.append(e(x_cat[:,i])) # Concatenate embedded categorical features x = torch.cat(embeddings, 1) # Apply dropout rates to categorical features x = self.emb_drop(x) # Batch normalize continuous features x_cont = self.bn_cont(x_cont) # Concatenate categorical and continuous features x = torch.cat([x, x_cont], 1) # Feed categorical and continuous features into neural net layers x = self.layers(x) return x
I am trying to use this model with skorch's GridSearchCV, as below:
from skorch import NeuralNetBinaryClassifier # Random seed chosen to ensure results are reproducible by using the same # initial random weights and biases, and applying dropout rates to the same # random embedded categorical features and neurons in the hidden layers torch.manual_seed(0) net = NeuralNetBinaryClassifier(module=TabularModel, module__emb_szs=emb_szs, module__n_cont=con_train.shape[1], module__out_sz=2, module__layers=[30], module__p=0.0, criterion=nn.CrossEntropyLoss, criterion__weight=cls_wgt, optimizer=torch.optim.Adam, optimizer__lr=0.001, max_epochs=150, device='cuda' ) from sklearn.model_selection import GridSearchCV param_grid = {'module__layers': [[30], [50,20]], 'module__p': [0.0, 0.2, 0.4], 'max_epochs': [150, 175, 200, 225] } models = GridSearchCV(net, param_grid, scoring='roc_auc').fit(cat_con_train.cpu(), y_train.cpu()) models.best_params_
But when I ran the code, I am getting this error message below:
/usr/local/lib/python3.6/distpackages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this traintest partition for these parameters will be set to nan. Details: ValueError: Expected module output to have shape (n,) or (n, 1), got (128, 2) instead FitFailedWarning) /usr/local/lib/python3.6/distpackages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this traintest partition for these parameters will be set to nan. Details: ValueError: Expected module output to have shape (n,) or (n, 1), got (128, 2) instead FitFailedWarning) /usr/local/lib/python3.6/distpackages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this traintest partition for these parameters will be set to nan. Details: ValueError: Expected module output to have shape (n,) or (n, 1), got (128, 2) instead FitFailedWarning) /usr/local/lib/python3.6/distpackages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this traintest partition for these parameters will be set to nan. Details: ValueError: Expected module output to have shape (n,) or (n, 1), got (128, 2) instead FitFailedWarning) /usr/local/lib/python3.6/distpackages/sklearn/model_selection/_validation.py:536: FitFailedWarning: Estimator fit failed. The score on this traintest partition for these parameters will be set to nan. Details: ValueError: Expected module output to have shape (n,) or (n, 1), got (128, 2) instead FitFailedWarning)  ValueError Traceback (most recent call last) <ipythoninput86c408d65e2435> in <module>() 98 > 99 models = GridSearchCV(net, param_grid, scoring='roc_auc').fit(cat_con_train.cpu(), y_train.cpu()) 100 101 models.best_params_ 11 frames /usr/local/lib/python3.6/distpackages/skorch/classifier.py in infer(self, x, **fit_params) 303 raise ValueError( 304 "Expected module output to have shape (n,) or " > 305 "(n, 1), got {} instead".format(tuple(y_infer.shape))) 306 307 y_infer = y_infer.reshape(1) ValueError: Expected module output to have shape (n,) or (n, 1), got (128, 2) instead
I am not sure what is wrong or how to fix this. Any help on this would really be appreciated.
Many thanks in advance!