How to check if the value is between two consecutive rows in dataframe or numpy array?

I need to write a code that checks if a certain value is between 2 consecutive rows, for example:

row <50 < next row

meaning if the value is between row and its consecutive row.

df = pd.DataFrame(np.random.randint(0,100,size=(10, 1)), columns=list('A'))

The output is:

    A
0  67
1  78
2  53
3  44
4  84
5   2
6  63
7  13
8  56
9  24

What I'd like to do is to check if (let's say I have a set value) "50" is between all consecutive rows.

Say, we check if 50 is between 67 and 78 and then between 78 and 53, obviously the answer is no, therefore in column B the result would be 0.

Now, if we check if 50 is between 53 and 44, then we'll get 1 in column B and we'll use cumsum() to count how many times the value of 50 is between consecutive rows in column A.

UPDATE: Let's say, if I have column C where I have 2 categories only: 1 and 2. How would I ensure that the check is performed within each of the categories separately? In other words, the check is reset once the category changes?

The desired output is:

    A   B   C   count
0  67   0   1    0
1  78   0   1    0
2  53   0   1    0
3  44   1   2    0
4  84   2   1    0
5   2   3   2    0
6  63   4   1    0
7  13   5   2    0
8  56   6   1    0
9  24   7   1    1

Greatly appreciate your help.

2 answers

  • answered 2019-12-15 03:11 oppressionslayer

    This should work:

    what = ((df.A < 50) | (50 > df.A.shift())) & ((df.A > 50) | (50 < df.A.shift())) 
    
    df['count'] = what.astype(int).cumsum()                                                                                                                                                           
    
        A  count
    0  67      0
    1  78      0
    2  53      0
    3  44      1
    4  84      2
    5   2      3
    6  63      4
    7  13      5
    8  56      6
    9  24      7
    
    

    or

    df = pd.DataFrame(np.random.randint(0,100,size=(10, 1)), columns=list('A'))                                                                                                         
    what = ((df.A < 50) | (50 > df.A.shift())) & ((df.A > 50) | (50 < df.A.shift())) 
    df['count'] = what.astype(int).cumsum()                                                                                                                                                                                    
    
        A  count
    0  45      0
    1  53      1
    2  44      2
    3  87      3
    4  47      4
    5  13      4
    6  20      4
    7  89      5
    8  81      5
    9  53      5
    

    Would your second output look like this:

    In [2306]: df                                                                                                                                                                                  
    Out[2306]: 
        A  B  C  count
    0  67  0  1      0
    1  78  0  1      0
    2  53  0  1      0
    3  44  1  2      1
    4  84  2  1      0
    5   2  3  2      1
    6  63  4  1      0
    7  13  5  2      1
    8  56  6  1      0
    9  24  7  1      1
    

  • answered 2019-12-15 03:15 Scott Boston

    Let's just subtract "50" from series and check sign change:

    import pandas as pd 
    import numpy as np 
    
    df = pd.DataFrame({'A':[67,78,53,44,84,2,63,13,56,24]}, columns=list('A'))
    
    s = df['A'] - 50
    df['count'] = np.sign(s).diff().fillna(0).ne(0).cumsum()
    print(df)
    

    Output:

        A  count
    0  67      0
    1  78      0
    2  53      0
    3  44      1
    4  84      2
    5   2      3
    6  63      4
    7  13      5
    8  56      6
    9  24      7