Counting words per sentence and sentences per paragraph in a text file

I'm having trouble obtaining normalising a dictionary. In my dictionary, I have a bunch of words we are meant to count in a text file. Now for each of these words/characters, "normalising", in the context of my project, is dividing their frequency/value by the total number of sentences in the given text. I then have to replace the old values of the dictionary with these new ones.

I.e. name of my dictionary is count, with keys and values like this:

{'and': 5, ';' : 3, '-' : 0...} 
def main(textfile, normalize == True):
    .
    .
    .
    .
    if normalize == True:
        for x in count:
            new_count[x] = count[x]/numSentence
            print(x,count[x])

Here's a sample file to try any codes on: https://www.dropbox.com/s/7xph5pb9bdf551h/sample2.txt?dl=0 Also note in the above code the normalise == True is there because in the top-level function

1 answer

  • answered 2019-05-18 18:15 Mahmoud Elshahat

    the code below show you an example of searching for a word in a string for example "remember me" has two matches for "me" one inside word "remember" and other one is "me" but only one of them is a word example:

    "remember me".count('me') # output: 2
    'me' in 'remember me' == 2  # True
    

    to match whole word only

    'me' in 'remember me'.split() == 1 # True
    

    so if i correctly understand your question here, you need to match the whole words:

    mydict = {'and': 5, ';' : 3, '-' : 0} 
    text = 'hello and me; in mem;ory ; me-ome _ -'
    
    # find a word frequency in a text
    def count(word, text):
        return len([w for w in text.split() if w == word])
    
    # update dictionary with new count
    mydict = {key:count(key, text) for key in mydict}
    print(mydict)
    

    output:

    {'and': 1, ';': 0, '-': 1}