Replace Emoji String from Demojize (string between two :'s)

I would like to count words and characters from instagram captions. I already have the data in a csv.

I used demojize to convert emojis to a text format (ex: :jack-o-lantern:). I would replace the emoji strings so that I am able to create a new column that counts words in the caption (excluding the emoji text).

So for example, if this were the value of the column titled 'caption' aka df['caption']:

"Comment your favorite below! :fallen_leaf::face_savoring_food::jack-o-lantern:" 

df['caption_with_replaced_emoji'] would look like:

"Comment your favorite below! ~~~"

I'm thinking it would be good to replace the emoji format with ~, but if there's a better way to do this, I'm all ears!

1 answer

  • answered 2020-11-25 07:02 Heo

    This works well, why you try to get hard?

    Try this:

    import re
    inp = """Comment your favorite below! :fallen_leaf::face_savoring_food::jack-o-lantern:"""
    
    subInput = re.sub(r':{1,2}[\w+-:]+:{1,2}','', inp)
    print(subInput)
    count=len(re.findall('[0-9A-Za-z]+',subInput))
    if count>1:
        print (f'{count}' +' words')
    else:
        print (f'{count}' +' word')
    

    Output:

    Comment your favorite below! 
    4 words