Creating a second dataframe considering 2 conditions from first dataframe

I have a main DataFrame and I have found some rows that I dont want. I have found those conditions in the code below:

df.query("group == 'treatment' and landing_page != 'new_page'") 
df.query("landing_page == 'new_page' and group != 'treatment'")

Now I want a df2 considering the entire df EXCEPT those rows given in the code above. I am getting a hard time trying to create this df2. Any lights?

My actual code:

df2 = df.query("group == 'treatment' and landing_page == 'new_page'") and df.query("group == 'control' and landing_page == 'old_page'")

I am receiving this error: ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

1 answer

  • answered 2018-12-05 20:28 coldspeed

    Change query to eval, and invert the mask when indexing df.

    m1 = df.eval("group == 'treatment' and landing_page != 'new_page'") 
    m2 = df.eval("landing_page == 'new_page' and group != 'treatment'")
    
    df_out = df[~(m1 | m2)]
    

    Or, a little more generically,

    stmts = [
        "group == 'treatment' and landing_page != 'new_page'",
        "landing_page == 'new_page' and group != 'treatment'"
    ]
    
    df_out = df[~np.logical_or.reduce([df.eval(stmt) for stmt in stmts])]