Are all objects returned by value rather than by reference?

I am coding in Python trying to decide whether I should return a numpy array (the result of a diff on some other array) or return numpy.where(diff)[0], which is a smaller array but requires that little extra work to create. Let's call the method where this happens methodB.

I call methodB from methodA. The rub is that I won't necessarily always need the where() result in methodA, but I might. So is it worth doing this work inside methodB, or should I pass back the (much larger memory-wise) diff itself and then only process it further in methodA if needed? That would be the more efficient choice assuming methodA just gets a reference to the result.

So, are function results ever not copied when they are passed back the the code that called that function?

I believe that when methodB finishes, all the memory in its frame will be reclaimed by the system, so methodA has to actually copy anything returned by methodB in to its own frame in order to be able to use it. I would call this "return by value". Is this correct?

4 answers

  • answered 2017-11-12 19:57 timgeb

    Assignment never copies data. If you have a function foo that returns a value, then an assignment like result = foo(arg) never copies any data. (You could, of course, have copy-operations in the function's body.) Likewise, return x does not copy the object x.

    Your question lacks a specific example, so I can't go into more detail.

    edit: You should probably watch the excellent Facts and Myths about Python names and values talk.

  • answered 2017-11-12 19:57 Jörg W Mittag

    Yes, you are correct. In Python, arguments are always passed by value, and return values are always returned by value. However, the value being returned (or passed) is a reference to a potentially shared, potentially mutable object.

    There are some types for which the value being returned or passed may be the actual object itself, e.g. this is the case for integers, but the difference between the two can only be observed for mutable objects which integers aren't, and de-referencing an object reference is completely transparent, so you will never notice the difference. To simplify your mental model, you may just assume that arguments and return values are always passed by value (this is true anyhow), and that the value being passed is always a reference (this is not always true, but you cannot tell the difference, you can treat it as a simple performance optimization).

    Note that passing / returning a reference by value is in no way similar (and certainly not the same thing) as passing / returning by reference. In particular, it does not allow you to mutate the name binding in the caller / callee, as pass-by-reference would allow you to.

    This particular flavor of pass-by-value, where the value is typically a reference is the same in e.g. ECMAScript, Ruby, Smalltalk, and Java, and is sometimes called "call by object sharing" (coined by Barbara Liskov, I believe), "call by sharing", "call by object", and specifically within the Python community "call by assignment" (thanks to @timgeb) or "call by name-binding" (thanks to @Terry Jan Reedy) (not to be confused with call by name, which is again a different thing).

  • answered 2017-11-12 21:13 hpaulj

    So roughly your code is:

    def methodA(arr):
         x = methodB(arr)
    def methodB(arr):
         diff = somefn(arr)
         # return diff   or
         # return np.where(diff)[0]

    arr is a (large) array, that is passed a reference to methodA and methodB. No copies are made.

    diff is a similar size array that is generated in methodB. If that is returned, it be referenced in the methodA namespace by x. No copy is made in returning it.

    If the where array is returned, diff disappears when methodB returns. Assuming it doesn't share a data buffer with some other array (such as arr), all the memory that it occupied is recovered.

    But as long as memory isn't tight, returning diff instead of the where result won't be more expensive. Nothing is copied during the return.

    A numpy array consists of small object wrapper with attributes like shape and dtype. It also has a pointer to a potentially large data buffer. Where possible numpy tries to share buffers, but readily makes new ndarray objects. Thus there's an important distinction between view and copy.

  • answered 2017-11-13 17:43 pvlkmrv

    I see what I missed now: Objects are created on the heap, but function frames are on the stack. So when methodB finishes, its frame will be reclaimed, but that object will still exist on the heap, and methodA can access it with a simple reference.