DL4J/ND4J: Can INDArray instance be reused?

I have a model to train on a large data set that does not fit into RAM. So, basically my plan is to slice the data set creating a DataSet instance with input vectors and associated labels for every chunk. E.g. if I have 1M input vectors/labels I'd split them into 10 chunks each having 100K records.
Then I'd put a chunk into 2 INDArray objects (for inputs and labels), create a DataSet and call model.fit() with that data set, repeating this procedure for every chunk and repeating the whole process until say the model's score reaches some value. My questions are:
1. Do I understand the process correctly?
2. Can the INDArray instances be reused? Would it be right to allocate them once and then just fill them up with data set chunks over and over again?

1 answer

  • answered 2018-07-11 08:07 Adam Gibson

    You don't have to do any of this. Workspaces already solves your allocation problem: http://deeplearning4j.org/workspaces

    Just use the standard datavec -> recordreaderdatasetiterator -> dataset pattern. That already handles minibatches for you.