Iterating through tables in text file

everyone.

I would say this is the first task I have not a clear idea where to start with:

Create a text file (using an editor, not necessarily Python) containing two tab- separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number by the second, and then sum the results from all the lines. Ignore any line that doesn’t contain two numeric columns.

so far I wrote a couple of lines, but I am not sure where would I need to go next:

filename = 'path'

def sum_columns(filename):
    sum = 0
    multiply = 0
    with open (filename) as f:

Should I split my file with 2 columns and create a list of them, or should I do something else?

Thank you in advance

4 answers

  • answered 2021-07-26 15:27 Nastor

    You can pretty much do a lot of things, given the exercise text. In my opinion, the best way would be to do something like this:

    filename = 'path'
    
    def sum_columns(filename):
        sum = 0
        multiply = 0
        with open (filename) as f:
            all_lines = f.readlines()
        f.close()
        for line in all_lines:
            splitted = line.split("\t")
            sum += int(splitted[0]) * int(splitted[1])
        return sum
    

    You'll get all lines of the file listed into all_lines, then you can iterate through every line and split them from the tab, then multiply them and sum them to the sum variable you initialized to 0, which you'll return at the end. As hinted by someone else, you could also read the file line by line without memorizing every line into a list, but if the file is relatively small, you can go with my option.

  • answered 2021-07-26 15:31 tituszban

    If you have a file like this:

    1   2
    2   4
    4   8
    

    You can do the following:

    from functools import reduce
    
    def is_int(s):
        try: 
            int(s)
            return True
        except ValueError:
            return False
    
    filename = 'path'
    
    def sum_columns(filename):
        with open (filename) as f:
            lines = f.readlines()
        return sum([
            reduce(lambda x, y: x * y, map(int,line.split("\t")))
            for line in lines
            if len(list(filter(is_int, line.split("\t")))) == 2
        ])
    

    Explanation:

    At the top I define a helper function, that determins if a string can be converted into an int or not. This will be used later to ignore lines that don't have 2 numbers. It's based on this answer

    def is_int(s):
        try: 
            int(s)
            return True
        except ValueError:
            return False
    

    Then, we open the file, and read all lines into a variable. This is not the most efficient, as it can be processed line by line without storing the while file, however, for smaller files this is negligable.

    with open (filename) as f:
        lines = f.readlines()
    

    Next, is a single operation to perform your query, but let's break it down:

    First, we iterate through all the lines:

    for line in lines
    

    Next, we only keep the lines that have exactly two numbers separated by tabs:

    if len(list(filter(is_int, line.split("\t")))) == 2
    

    Finally, we turn each number in the line into ints, and multiply them all together:

    reduce(lambda x, y: x * y, map(int,line.split("\t")))
    

    We then sum all of these and return the result

    Performance consideration

    If performance is a concern, you can achieve the same thing, reading the contents line by line, instead of pulling the whole file into a variable. It is less elegant, but more efficicient:

    def sum_columns(filename):
        total = 0
        with open (filename) as f:
            for line in f:
                if len(list(filter(is_int, line.split("\t")))) != 2:
                    continue
                total += reduce((lambda x, y: x * y), map(int,line.split("\t")))
        return total
    

    (Note, that you still need the import and helpers from the above example)

  • answered 2021-07-26 15:43 Loïc

    input.txt

    1 3
    2 6
    3 7
    7 12
    8
    

    script.py

    with open('input.txt') as f:
      total = 0
      for line in f:
        numbers = line.read().split('\t')
        try:
          line_value = int(numbers[0]) * int(numbers[1])
        except IndexError as e:
          # the line doesn't contain two numbers
          continue
        except ValueError as e:
          # a value couldn't be converted to a number
          continue
        total += line_value
    

  • answered 2021-07-26 15:46 Matiiss

    Here is a short solution:

    def sum_columns(filename):
        counter = 0
        with open(filename) as file:
            for line in file:
                try:
                    a, b = [int(x) for x in line.split('\t')]
                    counter += a * b
                except ValueError:
                    continue
        return counter
    
    
    file_name = 'myfile.txt'
    print(sum_columns(file_name))
    

    This is what a lot of people (@martineau to be the first) suggested to use in comments (also this is something I learned just now) so I decided to put it in an answer.

    Basically what happens, the loop iterates over each line and for each line creates a list of two integers (the list comprehension is for just that since otherwise both numbers are strings which will raise a ValueError if you try multiplying them), then also unpack the two values, which is great since then you only need one except since the only reasonable error thrown is ValueError (either because couldn't unpack or character couldn't be converted to integer) then multiply both values and add to the counter and at the end of the loop return the counter

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum