Understanding memory allocation in numpy: Is "temporary" memory being allocated when storing the result of an operation into variable[:, :]?

Let's assume two large multidimensional numpy arrays a and b. I want to perform an element-wise operation, e.g. adding them element by element:

c = a + b

In the above case, new memory is allocated for the result of a + b. A reference to this memory is then stored in c.

Now, let's assume that memory for c has already been allocated. Setting the number of dimensions to two for the purpose of having a simple example, I can do the following:

c[:, :] = a + b

I can not find any documentation on how the above is exactly implemented. I can imagine two ways:

  1. First, memory is allocated for performing the operation a + b. The result is stored into this "temporary" memory before the data i.e. the result of the operation is copied into c[:, :].
  2. There is no allocation of temporary memory. The result of a + b goes directly into c[:, :].

I played around with some code and - I could be absolutely wrong here - performance-wise it feels like the first option is more likely. Am I right? If so, how could I avoid the allocation of "temporary memory" and directly store the result into the memory which is already available in c? I'd guess that I have to be more explicit, use functions like numpy.add and provide references to the target memory to them.

1 answer

  • answered 2018-11-12 19:24 user2357112

    The operation you're looking for is

    numpy.add(a, b, out=c)

    With c[:, :] = a + b, the evaluation of a + b does not have information about the fact that the result will be assigned to c[:, :]. It must allocate a new array to hold the result of a + b.

    (Recent versions of NumPy do try to perform some C-level stack inspection to aggressively optimize temporaries beyond what the Python execution model would normally allow, but those optimizations don't handle this case. You can see the code in temp_elide.c, including some notes about what platforms it works on and why Python stack inspection isn't enough.)