Iterating through tables in text file
I would say this is the first task I have not a clear idea where to start with:
Create a text file (using an editor, not necessarily Python) containing two tab- separated columns, with each column containing a number. Then use Python to read through the file you’ve created. For each line, multiply each first number by the second, and then sum the results from all the lines. Ignore any line that doesn’t contain two numeric columns.
so far I wrote a couple of lines, but I am not sure where would I need to go next:
filename = 'path' def sum_columns(filename): sum = 0 multiply = 0 with open (filename) as f:
Should I split my file with 2 columns and create a list of them, or should I do something else?
Thank you in advance
You can pretty much do a lot of things, given the exercise text. In my opinion, the best way would be to do something like this:
filename = 'path' def sum_columns(filename): sum = 0 multiply = 0 with open (filename) as f: all_lines = f.readlines() f.close() for line in all_lines: splitted = line.split("\t") sum += int(splitted) * int(splitted) return sum
You'll get all lines of the file listed into
all_lines, then you can iterate through every line and split them from the tab, then multiply them and sum them to the
sumvariable you initialized to 0, which you'll return at the end. As hinted by someone else, you could also read the file line by line without memorizing every line into a list, but if the file is relatively small, you can go with my option.
If you have a file like this:
1 2 2 4 4 8
You can do the following:
from functools import reduce def is_int(s): try: int(s) return True except ValueError: return False filename = 'path' def sum_columns(filename): with open (filename) as f: lines = f.readlines() return sum([ reduce(lambda x, y: x * y, map(int,line.split("\t"))) for line in lines if len(list(filter(is_int, line.split("\t")))) == 2 ])
At the top I define a helper function, that determins if a string can be converted into an int or not. This will be used later to ignore lines that don't have 2 numbers. It's based on this answer
def is_int(s): try: int(s) return True except ValueError: return False
Then, we open the file, and read all lines into a variable. This is not the most efficient, as it can be processed line by line without storing the while file, however, for smaller files this is negligable.
with open (filename) as f: lines = f.readlines()
Next, is a single operation to perform your query, but let's break it down:
First, we iterate through all the lines:
for line in lines
Next, we only keep the lines that have exactly two numbers separated by tabs:
if len(list(filter(is_int, line.split("\t")))) == 2
Finally, we turn each number in the line into
ints, and multiply them all together:
reduce(lambda x, y: x * y, map(int,line.split("\t")))
We then sum all of these and return the result
If performance is a concern, you can achieve the same thing, reading the contents line by line, instead of pulling the whole file into a variable. It is less elegant, but more efficicient:
def sum_columns(filename): total = 0 with open (filename) as f: for line in f: if len(list(filter(is_int, line.split("\t")))) != 2: continue total += reduce((lambda x, y: x * y), map(int,line.split("\t"))) return total
(Note, that you still need the import and helpers from the above example)
1 3 2 6 3 7 7 12 8
with open('input.txt') as f: total = 0 for line in f: numbers = line.read().split('\t') try: line_value = int(numbers) * int(numbers) except IndexError as e: # the line doesn't contain two numbers continue except ValueError as e: # a value couldn't be converted to a number continue total += line_value
Here is a short solution:
def sum_columns(filename): counter = 0 with open(filename) as file: for line in file: try: a, b = [int(x) for x in line.split('\t')] counter += a * b except ValueError: continue return counter file_name = 'myfile.txt' print(sum_columns(file_name))
This is what a lot of people (@martineau to be the first) suggested to use in comments (also this is something I learned just now) so I decided to put it in an answer.
Basically what happens, the loop iterates over each line and for each line creates a list of two integers (the list comprehension is for just that since otherwise both numbers are strings which will raise a
ValueErrorif you try multiplying them), then also unpack the two values, which is great since then you only need one
exceptsince the only reasonable error thrown is
ValueError(either because couldn't unpack or character couldn't be converted to integer) then multiply both values and add to the counter and at the end of the loop return the counter