Complex mask for dataframe

I have a dataframe with a time series in one single column. The data looks like this chart


I would like to create a mask that is TRUE each time that the data is equal or lower than -0.20. It should also be TRUE before reaching -0.20 while negative. It should also be true after reaching -0.20 while negative. This version of the chart


is my manual attempt to show (in red) the values where the mask would be TRUE. I started creating the mask but I could only make it equal to TRUE while the data is less than -0.20 mask = (df['data'] < -0.2). I couldn't do any better, does anybody know how to achieve my goal?

2 answers

  • answered 2022-01-24 17:59 Benjamin Rio


    Group by consecutive values of same sign, and then check if the minimum of such a group is less than the defined treshold.


    First, we want to separate negative from positive values.

    negative_mask = (df['data']<0)

    We then can create classes (ordered with integers) for each consecutive positive or negative series. The class increases by one each time the data changes sign.

    consecutives = negative_mask.diff().ne(0).cumsum()

    We then select only the data where the minimum of the group of consecutive elements is less than 0.2.

    df.groupby(consecutives).filter(lambda df : df[0].min() < -0.2)

    Example with random data

    We can try our example with random data:

    import numpy as np
    import pandas as pd
    data = np.random.randint(-300, 300, size=1000)/1000
    df = pd.DataFrame(data, columns=["data"])


    2   -0.030
    3   -0.194
    4   -0.229
    5   -0.280
    6   -0.179
    ... ...
    991 -0.293
    995 -0.247
    996 -0.062
    997 -0.072
    999 -0.250
    363 rows × 1 columns

  • answered 2022-01-24 18:05 tlgs

    One approach could be to group segments that are entirely below zero, and then for each group verify whether or not there any values below -0.2.

    enter image description here

    See below for a full reproducible example script:

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    df = pd.DataFrame(
        {"y": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(10 ** 5)])}
    gt_zero = df["y"] < 0
    regions = (gt_zero != gt_zero.shift()).cumsum()
    # here's your interesting DataFrame with the specified mask
    df_interesting = df.groupby(regions).filter(lambda s: s.min() < -0.2)
    # plot individual regions
    for i, grp in df.groupby(regions):
        if grp["y"].min() < -0.2:
            plt.plot(grp, color="tab:red", linewidth=5, alpha=0.6)
    plt.axhline(0, linestyle="--", color="tab:gray")
    plt.axhline(-0.2, linestyle="--", color="tab:gray")

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum