Pandas Drop rows that do not contains a list of strings

I will start by saying I am not an expert in pandas and I do apology if my question sounds so basic.

I have csv file that have 5 columns, as follow:

timestamp.   message.  name.   destinationuserna.   source username
time.         hello.   hello.    user1.                user1
time.         hello.   hello1.    user2.                user1
time.         hello.   hello2.    user3.                user1
time.         hello.   no.       user3                user1
time.         hello.   don't.    random.                user1

This is just a basic example, but what I would like to do, is to check the column name and drop all the rows where the column name does not contain hello,hello1 or hello2.

Can please any expert help me to understand how I can achieve this?

I tried with this line of code, but it didn't work at all:

f[f['name'].str.contains("hello1")]

1 answer

  • answered 2021-07-27 17:43 norie

    Using isin

    To check for multiple values in a column you can use Pandas isin.

    import pandas as pd
    
    df = pd.read_csv('test.csv')
    
    names_to_keep =  ['hello', 'hello1', 'hello2']
    
    df = df[df['name'].isin(names_to_keep)]
    
    print(df)
    
    Sample Output
      timestamp message    name destinationuserna sourceusername
    0      time   hello   hello             user1          user1
    1      time   hello  hello1             user2          user1
    2      time   hello  hello2             user3          user1
    
    

    Using query

    Another method would be to use Pandas query.

    df  = df.query(f'name in {names_to_keep}')
    

    Using startswith

    If you only want to keep rows that start with a particular value you could use startswith.

    df = df[df['name'].str.startswith('hello')]
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum