Choose value in one column based on value subtraction in another column

I have a dataframe containing 4 columns. I want to subtract the last entry of col2 from second to last entry of col2 and see if the subtraction of last from second to last entry is greater than 10. If so, I would like to get the corresponding value for last and second to last in first column and replace the value of second to last in first column with NaN, and create another dataframe as output. Is there any way i can do it in pandas?

col1  col2   col3   col4
 e      21      1    2
 m      20      1    2
 k      9       1    2
 j      20      1    2


col1         col3   col4
[j, 'NaN']    1      2

I am looking for ways based on query so that making output in the format of dataframe would become easier by applying groupby or filtering.

The code i have tried so far but it seems it is not working.

last = df.iloc[-1]['col2']
second_to_last = df.iloc[-2]['col2']

difference = df.query("{ref} - {ref_1} > 10".format(ref=last, ref_1= second_to_last))

The error that i am getting in line 3:

ValueError: multi-line expressions are only valid in the context of data

1 answer

  • answered 2018-11-08 06:37 jezrael

    You can use:

    #get last and previous index values
    last = df.index[-1]
    second_to_last = df.index[-2]
    #boolena mask - scalar
    m1 = df.loc[last, 'col2'] - df.loc[second_to_last, 'col2'] > 10
    #boolean mask - array
    m2 = (df.index.isin([last, second_to_last]))
    #chain together
    m = m1 & m2
    print (m)
    [False False  True  True]
    df1 = df[m]
    print (df1)
      col1  col2  col3  col4
    2    k     9     1     2
    3    j    20     1     2
    #get last row, remove unnecessary column
    df2 = df1.iloc[[-1]].drop('col2', axis=1)
    #convert value to lsit and add missing value
    df2['col1'] = df2['col1'].apply(lambda x: list(x) + [np.nan])
    print (df2)
           col1  col3  col4
    3  [j, nan]     1     2