Complex mask for dataframe
I have a dataframe with a time series in one single column. The data looks like this chart
I would like to create a mask that is TRUE each time that the data is equal or lower than 0.20. It should also be TRUE before reaching 0.20 while negative. It should also be true after reaching 0.20 while negative. This version of the chart
is my manual attempt to show (in red) the values where the mask would be TRUE. I started creating the mask but I could only make it equal to TRUE while the data is less than 0.20 mask = (df['data'] < 0.2)
. I couldn't do any better, does anybody know how to achieve my goal?
2 answers

Idea
Group by consecutive values of same sign, and then check if the minimum of such a group is less than the defined treshold.
Implementation
First, we want to separate negative from positive values.
negative_mask = (df['data']<0)
We then can create classes (ordered with integers) for each consecutive positive or negative series. The class increases by one each time the data changes sign.
consecutives = negative_mask.diff().ne(0).cumsum()
We then select only the data where the minimum of the group of consecutive elements is less than 0.2.
df.groupby(consecutives).filter(lambda df : df[0].min() < 0.2)
Example with random data
We can try our example with random data:
import numpy as np import pandas as pd np.random.seed(42) data = np.random.randint(300, 300, size=1000)/1000 df = pd.DataFrame(data, columns=["data"])
Output
data 2 0.030 3 0.194 4 0.229 5 0.280 6 0.179 ... ... 991 0.293 995 0.247 996 0.062 997 0.072 999 0.250 363 rows × 1 columns

One approach could be to group segments that are entirely below zero, and then for each group verify whether or not there any values below
0.2
.See below for a full reproducible example script:
import matplotlib.pyplot as plt import numpy as np import pandas as pd np.random.seed(167) df = pd.DataFrame( {"y": np.cumsum([np.random.uniform(0.01, 0.01) for _ in range(10 ** 5)])} ) plt.plot(df) gt_zero = df["y"] < 0 regions = (gt_zero != gt_zero.shift()).cumsum() # here's your interesting DataFrame with the specified mask df_interesting = df.groupby(regions).filter(lambda s: s.min() < 0.2) # plot individual regions for i, grp in df.groupby(regions): if grp["y"].min() < 0.2: plt.plot(grp, color="tab:red", linewidth=5, alpha=0.6) plt.axhline(0, linestyle="", color="tab:gray") plt.axhline(0.2, linestyle="", color="tab:gray") plt.show()
do you know?
how many words do you know