pandas drop duplicates comparing by two df

I'm collecting some dataframes which I'm writing into my mysql-database. Now sometimes the old and new df have duplicates which I want to drop out. For example:

old df:

               timestamp    volume   price
id
211007692  1520969598625  0.181410  9044.9
211007688  1520969598364  0.100000  9045.0
211007687  1520969598340  0.050110  9045.0
211007673  1520969598122  0.005090  9046.1
211007667  1520969597783  0.083778  9046.1
211007666  1520969597782  0.010000  9046.1
211007665  1520969597781  0.010000  9046.1
211007664  1520969597780  0.010415  9046.1
211007663  1520969597779  0.012977  9046.1

new df

           timestamp    volume   price
id
211007709  1520969599391  0.061845  9043.6
211007708  1520969599370  0.181066  9043.6
211007705  1520969599222  0.132000  9043.5
211007700  1520969599006  1.000000  9044.5
211007694  1520969598710  0.100000  9043.5
211007692  1520969598625  0.181410  9044.9
211007688  1520969598364  0.100000  9045.0
211007687  1520969598340  0.050110  9045.0

Is there an elegant way to sort out all duplicates?

1 answer

  • answered 2018-03-13 21:33 Wen

    One way from duplicated

    s=pd.concat([old,new],keys=['old','new'])
    
    s[s.reset_index(level=1).duplicated(keep=False).values]
    Out[492]: 
                       timestamp   volume   price
        id                                       
    old 211007692  1520969598625  0.18141  9044.9
        211007688  1520969598364  0.10000  9045.0
        211007687  1520969598340  0.05011  9045.0
    new 211007692  1520969598625  0.18141  9044.9
        211007688  1520969598364  0.10000  9045.0
        211007687  1520969598340  0.05011  9045.0