Tracking letters in text file with dictionaries

everyone!

Consider following code, please

Body of exercise:

Read through a text file, line by line. Use a dict to keep track of how many times each vowel (a, e, i, o, and u) appears in the file. Print the resulting tabulation.

My code:

from io import StringIO

filename = StringIO('''For the last 2 years I managed to read more than
20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
the rest of 15 are fiction.''')

def sum_int(filename):
    VOWELS = {}
    occasions = 0
    filename = filename.read().split()
    for word in filename:
        for letter in word:
            if letter in 'aeiou':
                occasions += 1
                VOWELS[letter] = occasions
    return VOWELS

print(sum_int(filename)) **#returns {'o': 41, 'e': 38, 'a': 37, 'i': 40}

The problem is apparent: results of summing particular vowel in text are not correct at all.

What is the problem with my code?

3 answers

  • answered 2021-07-26 11:31 Sujay

    Try this. You can just add 1 to the dictionary value if you encounter the letter. Since, a KeyError can be raised, it means that the key, value pair is not present. So, you can initalise the key.

    from io import StringIO
    
    filename = StringIO('''For the last 2 years I managed to read more than
    20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
    the rest of 15 are fiction.''')
    
    def sum_int(filename):
        VOWELS = {}
        
        filename = filename.read().split()
        for word in filename:
            for letter in word:
                if letter in 'aeiou':
                    if letter in VOWELS:
                        VOWELS[letter] +=1
                    else:
                        VOWELS[letter]=1
        return VOWELS
    
    print(sum_int(filename)) 
    

  • answered 2021-07-26 11:36 Maurice Meyer

    You could set all vowels to 0 upfront:

    from io import StringIO
    
    filename = StringIO('''For the last 2 years I managed to read more than
    20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
    the rest of 15 are fiction.''')
    
    
    def sum_int(filename):
        vowels = 'aeiou'
        result = {v: 0 for v in vowels}
    
        filename = filename.read().split()
        for word in filename:
            for letter in word:
                if letter in vowels:
                    result[letter] += 1
        return result
    
    
    print(sum_int(filename))
    

    Out:

    {'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}
    

  • answered 2021-07-26 11:47 Booboo

    The comment by @YevhenKuzmovych is exactly right. But let me amplify his remark and make some suggestions.

    In your loop you have:

    occasions += 1
    

    This is being incremented for every occurrence of a vowel and thus maintains a total count of all vowels. It is clearly wrong to be using this as the count for a specific vowel. I would also rename this to vowel_count.

    There is also no need to split the input into words and iterate first on words and then on the letters in each word. You can just iterate on all the letters on the entire input string. Also, what is being passed to function sum_int (what does this name mean?) is not a file name, which would then need to be opened, but rather an already opened stream. Thus we have:

    from io import StringIO
    
    stream = StringIO('''For the last 2 years I managed to read more than
    20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
    the rest of 15 are fiction.''')
    
    def count_vowels(stream):
        vowels = dict(a=0, e=0, i=0, o=0, u=0)
        #vowel_count = 0
        s = stream.read()
        for ch in s:
            if ch in 'aeiou':
                #vowel_count += 1
                vowels[ch] += 1
        return vowels
    print(count_vowels(stream))
    

    Prints:

    {'a': 9, 'e': 17, 'i': 5, 'o': 10, 'u': 0}
    

    Or you can use the collections.Counter class:

    from io import StringIO
    from collections import Counter
    
    stream = StringIO('''For the last 2 years I managed to read more than
    20 books. 3 of them were in Sci-Fi genre, almost 10 of them are self-help,
    the rest of 15 are fiction.''')
    
    def count_vowels(stream):
        vowels = Counter()
        #vowel_count = 0
        s = stream.read()
        for ch in s:
            if ch in 'aeiou':
                #vowel_count += 1
                vowels[ch] += 1
        return vowels
    counts = count_vowels(stream)
    for vowel in 'aeiou':
        print (vowel, '->', counts[vowel])
    

    Prints:

    a -> 9
    e -> 17
    i -> 5
    o -> 10
    u -> 0
    

    Notes

    s is the entire string and ch represents each character in the string whether it is a letter or space or punctuation mark such as a period. So you are inspecting each character and just selecting the vowels.

    It is inefficient to first break up the string into quasi or psuedo words using split. I say quasi words because what you end up after removing the white space are not truly words because you still have punctuation marks appended to some of the words. Also the split just removes spaces and ends up creating a list of these quasi words and takes up additional time and space to do this (this is not a big issue if your input string is not too large but creates needless additional overhead especially for large input). Then you are forced to perform a double loop, first on each quasi word and then on each character in the quasi word. That is not as efficient as performing a single loop for each character in the initial string.

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum