How to use itertools.isslice() in a tab delim file that does not contain column/row headers

I have a .txt file that contains 1000 rows of 500 columns containing integers ranging from 0-2. For example the first three row might look like:

0 0 0 0 2 2 2 2 2 1 1 1 0 2 1 2 ...
0 2 2 2 0 0 0 0 1 0 0 0 1 0 2 0 ...
0 2 2 2 2 2 2 2 0 1 1 1 1 1 1 1 ...

I'm going to compare each column value to the other in a given row and do something with them.

However, I need to distinguish the first 500 rows from the last 500 during my iteration.

if I code in something like:

for row in file:
    for col1 in row:
        for col2 in row:

The for loop will include all rows. My aim is to have two for loops like this, one for the first 500 rows and another for the last 500 rows. The rows are separated by new lines and it is a .txt file I am reading in using csv.reader.

Searching around some I see that itertools.isslice() may work for this problem, but in all the examples I've seen in documentation there is either a single row or there are qualities of the first column in each row that can be used to distinguish the rows themselves.

Am I on the correct track thinking I can use itertools.isslice() to separate rows or will that not work here?

Thanks in advance for the help.

1 answer

  • answered 2018-10-10 05:22 pylang

    It seems you wish to split a file.

    Option 1: Yes, you can accomplish this with itertools.islice. The rows themselves can be separated with the csv module.

    Given

    A sample tab-delimited file test.txt:

    # test.txt
    a   0   0   0   0   2   2   2   2   2
    b   0   2   2   2   0   0   0   0   1
    c   0   2   2   2   0   0   0   0   1
    d   0   0   0   0   2   2   2   2   2
    e   0   2   2   2   0   0   0   0   1
    f   0   2   2   2   0   0   0   0   1
    g   0   0   0   0   2   2   2   2   2
    h   0   2   2   2   0   0   0   0   1
    i   0   2   2   2   0   0   0   0   1
    

    >>>  import csv
    >>>  import itertools as it
    
    
    >>> fpath = "./test.txt"
    

    Code

    We implement a generator that can read a file and cleanly yield its rows:

    >>> def read_file(filepath):
    ...     with open(filepath, "r") as f:
    ...         reader = csv.reader(f, delimiter="\t")
    ...         for row in reader:
    ...             yield row
    

    Demo

    Now we read the file and slice some rows, e.g. 5. The remaining lines contain the rest of the file:

    >>> lines = read_file(fpath)
    
    >>> top = list(it.islice(lines, 5))
    >>> bot = list(lines)
    
    >>> top
    [['a', '0', '0', '0', '0', '2', '2', '2', '2', '2'],
     ['b', '0', '2', '2', '2', '0', '0', '0', '0', '1'],
     ['c', '0', '2', '2', '2', '0', '0', '0', '0', '1'],
     ['d', '0', '0', '0', '0', '2', '2', '2', '2', '2'],
     ['e', '0', '2', '2', '2', '0', '0', '0', '0', '1']]
    
    >>> bot
    [['f', '0', '2', '2', '2', '0', '0', '0', '0', '1'],
     ['g', '0', '0', '0', '0', '2', '2', '2', '2', '2'],
     ['h', '0', '2', '2', '2', '0', '0', '0', '0', '1'],
     ['i', '0', '2', '2', '2', '0', '0', '0', '0', '1']]
    

    See also more on parsing with csv.


    Option 2: Alternatively, consider pandas, a third-party library.

    Demo

    >>> import pandas as pd
    
    
    >>> df = pd.read_csv(fpath, delimiter="\t", header=None)
    
    >>> top = df.iloc[:5, :]
    >>> bot = df.iloc[5:, :]
    
    >>> top    
       0  1  2  3  4  5  6  7  8  9
    0  a  0  0  0  0  2  2  2  2  2
    1  b  0  2  2  2  0  0  0  0  1
    2  c  0  2  2  2  0  0  0  0  1
    3  d  0  0  0  0  2  2  2  2  2
    4  e  0  2  2  2  0  0  0  0  1
    
    >>> bot
       0  1  2  3  4  5  6  7  8  9
    5  f  0  2  2  2  0  0  0  0  1
    6  g  0  0  0  0  2  2  2  2  2
    7  h  0  2  2  2  0  0  0  0  1
    8  i  0  2  2  2  0  0  0  0  1
    

    See also this tutorial on selections with pandas.