Creating IF statement for column based on NaN

Here's a sample of my data.

  df[['caption', 'mentions']].sample(7)
    
    
      caption                                             mentions
    42  b'Alexa is helping people of all abilities do ...   NaN
    48  NaN NaN
    7   b'Introducing Amazon Pharmacy. :pill::clipboar...   NaN
    25  b"When it's day:victory_hand_selector:and the ...   charliesmallsthedood
    58  b'We look at all angles when it comes to safet...   NaN
    88  b'A night in with your favorite food + pup + e...   amazonfiretv,lissettecalv
    22  b'Get everyday essentials auto-delivered AND s...   NaN

I want to create a column that counts number of mentions in a caption. For the above sample it would return (0,0,0,2,0,1,0)

Here is what I've tried so far:

mentions = df['mentions'].str.lower().str.split(',')

for value in df['mentions']:
    if value != 'nan':
        df['mention_counts'] = mentions.apply(len)
    else:
        df['mention_counts'] = 0

Help please!

1 answer

  • answered 2020-12-01 23:19 ShlomiF

    The easiest thing to do would be to write your functionality out explicitly as so -

    def count_thing(row):
        if type(row.mentions) == str:
            return len(row.mentions.split(','))
        elif np.isnan(row.mentions):
            return 0
        else:
            pass # not sure how you want to deal with this case...
    

    and then use apply to get the required column:

    df['mention_counts'] = df.apply(count_thing, axis=1)
    

    On a side note, I don't see any reason to use lower, seeing as you're splitting on , which is uneffected...