Python3 pipe I/O on np.ndarray with raw binary data failed

I have a binary raw data file in.dat storing 4 int32 values.

$ xxd in.dat 
00000000: 0100 0000 0200 0000 0300 0000 0400 0000  ................

I want to read them into np.ndarray, multiply by 2, then write them out to stdout with the same raw binary format as in.dat. The expected output is like,

$ xxd out.dat 
00000000: 0200 0000 0400 0000 0600 0000 0800 0000  ................

The code is like this,

#!/usr/bin/env python3

import sys
import numpy as np

if __name__ == '__main__':
    y = np.fromfile(sys.stdin, dtype='int32')
    y *= 2

I find it works as expected with <,

$ python3 <in.dat >out.dat

But it does not work with a pipe |. Here comes the error message.

$ cat in.dat | python3 >out.dat
Traceback (most recent call last):
  File "", line 7, in <module>
    y = np.fromfile(sys.stdin, dtype='int32')
OSError: obtaining file position failed

What do I miss here?

2 answers

  • answered 2018-07-11 03:48 Bailey Parker

    This is because when redirecting a file in, stdin is seekable (because it isn't a TTY or pipe, for example, it's just a file that's been given FD 1). Try invoking the following script with cat foo.txt | python3 vs python3 <foo.txt (assuming foo.txt contains some text):

    import sys

    The former will error with:

    Traceback (most recent call last):
      File "", line 3, in <module>
    io.UnsupportedOperation: underlying stream is not seekable

    That said, numpy is way overkill for what you're trying to do here. You can easily achieve this with a few lines and struct:

    import struct
    import sys
    FORMAT = '@i'
    def main():
            while True:
                num = struct.unpack(FORMAT,
                sys.stdout.buffer.write(struct.pack(FORMAT, num * 2))
        except EOFError:
    if __name__ == '__main__':

    Edit: there's also no need for sys.exit(0). This is the default.

  • answered 2018-07-11 03:51 juanpa.arrivillaga

    If you use np.frombuffer, it should work both ways:

    import numpy as np
    import sys
    print(np.frombuffer(, dtype=np.int32))


    Juans-MacBook-Pro:temp juan$ xxd testdata.dat
    00000000: 0100 0000 0200 0000 0300 0000            ............
    Juans-MacBook-Pro:temp juan$ python < testdata.dat
    [1 2 3]
    Juans-MacBook-Pro:temp juan$ cat testdata.dat | python
    [1 2 3]
    Juans-MacBook-Pro:temp juan$

    Although, I suspect this will make a copy of the data.