How to apply a function to a list of timestamp to create a pandas serie?

OK, the working part of the code: I have a function that given a timestamp, and a period (minute, hour, month...) returns the period duration as a timedelta. Basically, for minute, hour, day, it directly call pandas Timedelta function. For month, it is a bit 'smarter' as it checks in which month the timestamp is, and return the number of days of identified month.

import pandas as pd

def as_timedelta(ref_ts: pd.Timestamp = None):
    """
    Return the duration of a time period.
    For a month, obtaining its duration requires a reference timestamp to identify
    how many days have to be accounted for in the month.
    """

    # An input timestamp has to be given.
    # It is assumed given timestamp is at beginning of time period for which a time delta is requested.
    # Because of a possible timezone, the timestamp is max 12 hours before or after
    # beginning of month in UTC.
    # To assume the current month, we check what is the closest month beginning
    # As an example, if 31st of January, 6:00 PM is reference timestamp, duration is given for month of February

    # Get month starts
    current_month = pd.Timestamp(year=ref_ts.year, month=ref_ts.month, day=1)
    next_month = current_month + pd.DateOffset(months=1)
    nex_next_month = current_month + pd.DateOffset(months=2)
    # Get month of interest
    dist_to_next = next_month - ref_ts
    dist_to_prev = ref_ts - current_month
    # Return timedelta corresponding as the duration between current month and begining of next month
    td_13 = pd.Timedelta(13, 'h')
    if dist_to_next < td_13:
        return nex_next_month - next_month
    elif dist_to_prev < td_13:
        return next_month - current_month

Given a list of timestamps, I would like to apply this function to each timestamps. But trying with following line of code, but I get an AttributeError. To illustrate the trouble now, I am taking an example:

ts_list_1M = [
          "Thu Feb 01 2019 00:00:00 GMT+0100",
          "Thu Mar 01 2019 00:00:00 GMT+0100",
          "Sun Apr 01 2019 00:00:00 GMT+0200"]
op_list_1M = [7134.0, 7134.34, 7135.03]
GC_1M = pd.DataFrame(list(zip(ts_list_1M, op_list_1M)), columns =['date', 'open'])
GC_1M['date'] = pd.to_datetime(GC_1M['date'], utc=True)
GC_1M.rename(columns={'date': 'Timestamp'}, inplace=True)
GC_1M.set_index('Timestamp', inplace = True, verify_integrity = True)

The famous line of code:

GC_1M.reset_index().apply(as_timedelta,axis=1).values

And the error message I get:

File "<ipython-input-49-ff9556f2ec44>", line 18, in as_timedelta
current_month = pd.Timestamp(year=ref_ts.year, month=ref_ts.month, day=1)

File "C:\Users\pierre.juillard\Documents\Programs\Anaconda\lib\site-packages\pandas\core\generic.py", line 5179, in __getattr__
return object.__getattribute__(self, name)

AttributeError: ("'Series' object has no attribute 'year'", 'occurred at index 0')

When I test the function on a single value, it works, but when applying it like this, it doesn't. Please, any advice on how to achieve that?

I thank you in advance for your help! Bests,

1 answer

  • answered 2020-02-19 13:36 BStadlbauer

    So when you want to apply your function to your 'dates' series only you could do the following:

    GC_1M['date'].apply(as_timedelta)
    

    However, this does not seem to work, as in your example 'date' is not a datetime object, so you will need to convert it first (you could also do this upon creation):

    GC_1M['date'] = pd.to_datetime(GC_1M['date'])
    

    And finally, your as_timedelta function cannot deal with timezone-aware inputs, I added a comment to the line that needs fixing:

    def as_timedelta(ref_ts: pd.Timestamp = None):
        """
        Return the duration of a time period.
        For a month, obtaining its duration requires a reference timestamp to identify
        how many days have to be accounted for in the month.
        """
    
        # An input timestamp has to be given.
        # It is assumed given timestamp is at beginning of time period for which a time delta is requested.
        # Because of a possible timezone, the timestamp is max 12 hours before or after
        # beginning of month in UTC.
        # To assume the current month, we check what is the closest month beginning
        # As an example, if 31st of January, 6:00 PM is reference timestamp, duration is given for month of February
    
        # Get month starts
        current_month = pd.Timestamp(year=ref_ts.year, month=ref_ts.month, day=1, tzinfo=ref_ts.tzinfo)  # Make current_month timezone aware
        next_month = current_month + pd.DateOffset(months=1)
        nex_next_month = current_month + pd.DateOffset(months=2)
        # Get month of interest
        dist_to_next = next_month - ref_ts
        dist_to_prev = ref_ts - current_month
        # Return timedelta corresponding as the duration between current month and begining of next month
        td_13 = pd.Timedelta(13, 'h')
        if dist_to_next < td_13:
            return nex_next_month - next_month
        elif dist_to_prev < td_13:
            return next_month - current_month