Python unhashable type: slice on list

I am trying to write a little script that will look at a string of text, remove the stop words, then return the top 10 most commonly used words in that string as a list.

This is my code:

from collections import Counter as c
from nltk.corpus import stopwords
stop = set(stopwords.words('english'))
description = ("This is some place holder text for a shop that sells shoes, coats and jumpers.  We sell lots of shoes but never sell t-shirts.  Please come to our shop if you want some jumpers")
description = ([word for word in description.lower().split() if word not in stop])
common_list = c(description)
top_ten = (common_list[:9])

However, this gives me the error message unhashable type: slice. I think this is because common_list might not actually be a list.. I am very new to python so please excuse if this is really silly.

3 answers

  • answered 2017-11-12 19:43 en_lorithai

    common_list is a dictionary, can't slice it (common_list[:9] won't work). You probably have to convert the common_list into an actual list and sort that one based on the occurrences.

  • answered 2017-11-12 20:03 Joe Iddon

    You can use the following one-liner:

    top_ten = sorted(c(description).items(), key=lambda p:p[1])[::-1][:10]
    

    why?

    Well you essentially have a list of words:

    description = ["cat", "fish", "cat", "cat", "dog", "dog"]
    

    and then you can get the counts of each elements with the c() function so by taking c(description) which gives:

    Counter({'cat': 3, 'dog': 2, 'fish': 1})
    

    and then we need to sort this and that is done by sorting on the second element of each tuple with key=lambda p:p[1]. Which in our case would give:

    [('fish', 1), ('dog', 2), ('cat', 3)]
    

    then we need to reverse it with [::-1] and take the first 10 elements with [:10]. Which would leave us with:

    [('cat', 3), ('dog', 2), ('fish', 1)]
    

    If you just want the words, just take the first element from each list in the top_ten list with:

    [i[0] for i in top_ten]
    

  • answered 2017-11-12 20:28 Joe Iddon

    This can be done with the Counter object's most_common method which makes it really easy:

    top_ten = c(description).most_common(10)
    

    The documentation states:

    Return a list of the n most common elements and their counts from the most common to the least.

    So as it returns both the element and their counts and we only want the element, we still need to use a list-comprehension:

    top_ten = [i[0] for i in c(description).most_common(10)]