Add beginning of next month per id partition

I'm trying to add a new column to my pandas dataframe called rev_month. This column should be an iterative addition of the value in the previous row. first_date column is datetime64[ns]

This is the input:

id    first_date             revenue_month_number   
1     2020-12-30 15:14:49    1                      
2     2021-03-01 01:36:23    1                     
2     2021-03-01 01:36:23    2
2     2021-03-01 01:36:23    3                      
3     2021-03-02 19:13:56    1                      
3     2021-03-02 19:13:56    2                     
3     2021-03-02 19:13:56    3                       
3     2021-03-02 19:13:56    4                      
3     2021-03-02 19:13:56    5                       

The rev_month column should (iterating by id) always get the beginning of the month from first_date for revenue_month_number == 1, and add one more (beginning of) month value for the consequent revenue_month_numbers.

Desired output:

id    first_date             revenue_month_number    rev_month
1     2020-12-30 15:14:49    1                       2020-12-01
2     2021-03-01 01:36:23    1                       2021-03-01
2     2021-03-01 01:36:23    2                       2021-04-01
2     2021-03-01 01:36:23    3                       2021-05-01
3     2021-03-02 19:13:56    1                       2021-03-01
3     2021-03-02 19:13:56    2                       2021-04-01
3     2021-03-02 19:13:56    3                       2021-05-01
3     2021-03-02 19:13:56    4                       2021-06-01
3     2021-03-02 19:13:56    5                       2021-07-01

I have tried multiple things but I can't seem to make this work.

Would appreciate if anyone has suggestions!

The dataframe can be reproduced with:

data = {'first_date': ['2020-12-30 15:14:49', '2021-03-01 01:36:23', '2021-03-01 01:36:23',
                       '2021-03-01 01:36:23', '2021-03-02 19:13:56', '2021-03-02 19:13:56',                  
                        '2021-03-02 19:13:56', '2021-03-02 19:13:56', '2021-03-02 19:13:56'],
        'revenue_month_number': [1,1,2,3,1,2,3,4,5]}
df = pd.DataFrame.from_dict(data)
df['first_date'] = pd.to_datetime(df['first_date'])

1 answer

  • answered 2021-05-14 14:05 BENY

    Let us try offsets

    df['new'] = df.apply(lambda x: x['first_date'] + pd.offsets.MonthEnd(x['revenue_month_number']) +pd.offsets.MonthBegin(-1) , axis=1)
    df
    Out[43]: 
       id first_date  revenue_month_number        new
    0   1 2020-12-30                     1 2020-12-01
    1   2 2021-03-01                     1 2021-03-01
    2   2 2021-03-01                     2 2021-04-01
    3   2 2021-03-01                     3 2021-05-01
    4   3 2021-03-02                     1 2021-03-01
    5   3 2021-03-02                     2 2021-04-01
    6   3 2021-03-02                     3 2021-05-01
    7   3 2021-03-02                     4 2021-06-01
    8   3 2021-03-02                     5 2021-07-01