Choose value in one column based on value subtraction in another column
I have a dataframe containing 4 columns. I want to subtract the last entry of col2
from second to last entry of col2
and see if the subtraction of last from second to last entry is greater than 10. If so, I would like to get the corresponding value for last and second to last in first column and replace the value of second to last in first column with NaN
, and create another dataframe as output. Is there any way i can do it in pandas?
col1 col2 col3 col4
e 21 1 2
m 20 1 2
k 9 1 2
j 20 1 2
Output:
col1 col3 col4
[j, 'NaN'] 1 2
I am looking for ways based on query so that making output in the format of dataframe would become easier by applying groupby
or filtering
.
The code i have tried so far but it seems it is not working.
last = df.iloc[1]['col2']
second_to_last = df.iloc[2]['col2']
difference = df.query("{ref}  {ref_1} > 10".format(ref=last, ref_1= second_to_last))
The error that i am getting in line 3:
ValueError: multiline expressions are only valid in the context of data
1 answer

You can use:
#get last and previous index values last = df.index[1] second_to_last = df.index[2] #boolena mask  scalar m1 = df.loc[last, 'col2']  df.loc[second_to_last, 'col2'] > 10 #boolean mask  array m2 = (df.index.isin([last, second_to_last])) #chain together m = m1 & m2 print (m) [False False True True] #filter df1 = df[m] print (df1) col1 col2 col3 col4 2 k 9 1 2 3 j 20 1 2 #get last row, remove unnecessary column df2 = df1.iloc[[1]].drop('col2', axis=1) #convert value to lsit and add missing value df2['col1'] = df2['col1'].apply(lambda x: list(x) + [np.nan]) print (df2) col1 col3 col4 3 [j, nan] 1 2