How to subtract time when there is a date change in pandas?

I have following dataframe in pandas

    start_date        start_time        end_time
    2018-01-01        23:55:00          00:05:00
    2018-01-02        00:05:00          00:10:00
    2018-01-03        23:59:00          00:05:00

I want to calculate the time difference. But, for 1st and 3rd observation, there is a date change in end_time.

How can I do it in pandas?

Currently, I am using the logic where end_time is less than start_time I am creating one more column called end_date where it increments the start_date by 1 and then subtracts the time.

Is there any other way to do it?

1 answer

  • answered 2018-11-08 08:40 jezrael

    Solution working with timedeltas - if difference are days equal -1 then add one day:

    df['start_time'] = pd.to_timedelta(df['start_time'])
    df['end_time'] = pd.to_timedelta(df['end_time'])
    
    d =  df['end_time'] - df['start_time']
    df['diff'] = d.mask(d.dt.days == -1, d + pd.Timedelta(1, unit='d'))
    print (df)
       start_date start_time end_time     diff
    0  2018-01-01   23:55:00 00:05:00 00:10:00
    1  2018-01-02   00:05:00 00:10:00 00:05:00
    2  2018-01-03   23:59:00 00:05:00 00:06:00
    

    Another solution:

    s = df['end_time'] - df['start_time']
    df['diff'] = np.where(df['end_time'] < df['start_time'], 
                          s + pd.Timedelta(1, unit='d'), 
                          s)
    print (df)
    
       start_date start_time end_time     diff
    0  2018-01-01   23:55:00 00:05:00 00:10:00
    1  2018-01-02   00:05:00 00:10:00 00:05:00
    2  2018-01-03   23:59:00 00:05:00 00:06:00