Simple Python Nest For Loop Exhausts Memory [SOLVED]

I have a list of quadruples which is read from a CSV file, with element of the form t =(id, e1, e2, label). The list should contain for every t, the tuple: (someID, e2,e1, 3-label). If not, I need to add to the list.

I have written the following code and narrowed down my list to only 50 tuples.

import nltk
import csv
#file string
fs = "mydataset.csv"
with open(fs) as infile:
    rows = list(csv.reader(infile))[950:1000]
    size = len(rows)
    print("initial size =", size)
    newSize = size
    firstId = int(rows[1][0])
    lastId = int(rows[size-1][0])
    for i in range(size):
        if i % 500 == 0:
            rint("program progression = ", i*100/size, '%', sep ='')
        tempRow = rows[i]
        if tempRow[-1] == '1' or tempRow[-1] == '2':
                for j in range(i+1, size):
                # print("j = ", j)
                    if tempRow[1] == rows[j][2] and tempRow[2] == rows[j][1]:
                        if int(tempRow[3]) == 3-int(rows[j][3]):
                            break
                        else:
                            print("error in row: ", i)
                    else:
                        if j == size -1:
                            lastId +=1
                            print(tempRow[-1])
                            rows += rows +[[ str(lastId), tempRow[2], tempRow[1], str(3-int(tempRow[3])) ]]
                            newSize +=1
                            print("newSize", newSize)
    print("END")

When I run this, it exhausts my memory. It uses over 8GB of memory? What is going on please? my CSC file has only 7200 rows with 4 columns. I really don't know what else I should do. Any help would be greatly appreciated.

CS

2 answers

  • answered 2019-09-10 02:22 Tim Peters

    Don't know, but this line looks highly suspect:

    rows += rows +[[ str(lastId), tempRow[2], tempRow[1], str(3-int(tempRow[3])) ]]
    

    I can't guess what you're trying to do here, but it's very unlikely this line implements your intent. This more than doubles the length of rows each time it's executed, and there doesn't appear to be anything in your code that reduces the length of rows.

    Making it really simple to get the point across:

    >>> rows = [1]
    >>> for i in range(20):
    ...     rows += rows
    ...     print(i, len(rows))
    

    displays:

    0 2
    1 4
    2 8
    3 16
    4 32
    5 64
    6 128
    7 256
    8 512
    9 1024
    10 2048
    11 4096
    12 8192
    13 16384
    14 32768
    15 65536
    16 131072
    17 262144
    18 524288
    19 1048576
    

  • answered 2019-09-10 02:29 chikitin

    Thank you so much! You got it. I had to do remove +!

    rows = rows +[[ str(lastId), tempRow[2], tempRow[1], str(3-int(tempRow[3])) ]]