How to extract component from a pandas datetime column and assign them

The following code for getting the week number and year works:

import pandas as pd

df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='w', periods=10), columns=['date'])

df['weekNo'] = df['date'].dt.isocalendar().week
df['year'] = df['date'].dt.year

        date  weekNo  year
0 2021-12-05      48  2021
1 2021-12-12      49  2021
2 2021-12-19      50  2021
3 2021-12-26      51  2021
4 2022-01-02      52  2022
5 2022-01-09       1  2022
6 2022-01-16       2  2022
7 2022-01-23       3  2022
8 2022-01-30       4  2022
9 2022-02-06       5  2022

but,

df['weekYear'] = "%d/%d" % (df['date'].dt.isocalendar().week, df['date'].dt.year)

Gives the error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_26440/999845293.py in <module>
----> 1 df['weekYear'] = "%d/%d" % (df['date'].dt.isocalendar().week, df['date'].dt.year)

TypeError: %d format: a number is required, not Series

I am accessing the week and year in a way that accesses the series of values, as shown by the first code snippet. Why doesn't that work when I want a formatted string? How do I re-write the code in snippet 2, to make it work? I don't want to make intermediate columns.

4 answers

  • answered 2021-11-29 03:47 Muhammad Hassan

    You can just use:

    df['weekYear'] = df['date'].dt.isocalendar().week.astype(str) + '/' +  df['date'].dt.year.astype(str)
    

    Or using pandas.Series.str.cat

    df['weekYear'] = df['date'].dt.isocalendar().week.astype(str).str.cat(df['date'].dt.year.astype(str), sep='/')
    

    Or using list comprehension

    df['weekYear'] = [f"{week}/{year}" for week, year in zip(df['date'].dt.isocalendar().week, df['date'].dt.year)]
    

    Timing for 1M rows

    df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='h', periods=1000000), columns=['date'])
    
    %%timeit
    df['date'].dt.isocalendar().week.astype(str) + '/' +  df['date'].dt.year.astype(str)
    
    [out]: 886 ms ± 9.28 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    %%timeit
    df['date'].dt.isocalendar().week.astype(str).str.cat(df['date'].dt.year.astype(str), sep='/')
    
    [out]: 965 ms ± 8.56 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    
    %%timeit
    [f"{week}/{year}" for week, year in zip(df['date'].dt.isocalendar().week, df['date'].dt.year)]
    
    [out]: 587 ms ± 7.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

  • answered 2021-11-29 03:53 jwzinserl

    If you want to use the formatting, can use map to get that map or apply the formatting to every road, the .dt is not needed since you will be working with date itself, not Series of dates. Also isocalendar() returns a tuple where second element is the week number:

    df["date"] = pd.to_datetime(df["date"]) 
    df['weekYear'] =  df['date'].map(lambda x: "%d/%d" % (x.isocalendar()[1], x.year))
    

    Timing for 1M rows

    df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='h', periods=1000000), columns=['date'])
    
    %%timeit
    df['date'].map(lambda x: "%d/%d" % (x.isocalendar()[1], x.year))
    
    [out]: 2.03 s ± 4.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

  • answered 2021-11-29 03:53 BENY

    • Why doesn't that work when I want a formatted string? The error is clear, because '%d' expects a single decimal value, not a pandas.Series
    • Providing there is a format code for the value to be extracted, dt.strftime can be used.
      • This requires the 'date' column to be a datetime dtype, which can be done with pd.to_datetime. The column in the following example is already the correct dtype.
      • '%V': ISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4.
      • '%Y': Year with century as a decimal number.
    import pandas as pd
    
    # sample data
    df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='w', periods=10), columns=['date'])
    
    # add week number and year
    df['weekYear'] = df.date.dt.strftime('%V/%Y')
    
    # display(df)
            date weekYear
    0 2021-12-05  48/2021
    1 2021-12-12  49/2021
    2 2021-12-19  50/2021
    3 2021-12-26  51/2021
    4 2022-01-02  52/2022
    5 2022-01-09  01/2022
    6 2022-01-16  02/2022
    7 2022-01-23  03/2022
    8 2022-01-30  04/2022
    9 2022-02-06  05/2022
    

    Timing for 1M rows

    df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='h', periods=1000000), columns=['date'])
    
    %%timeit
    df.date.dt.strftime('%V/%Y')
    
    [out]: 3.74 s ± 19.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    

  • answered 2021-11-29 20:25 Trenton McKinney

    • There are clearly a number of ways this can be solved, so a timing comparison is the best way to determine which is the "best" answer.
    • Here's a single implementation for anyone to run a timing analysis in Jupyter of all the current answers.
    • See this answer to modify the code to create a timing analysis plot with a varying number of rows.
    • See IPython: %timeit for the option descriptions.
    import pandas as pd
    
    # sample data with 60M rows
    df = pd.DataFrame(data=pd.date_range('2021-11-29', freq='s', periods=60000000), columns=['date'])
    
    # functions
    def test1(d):
        return d.date.dt.strftime('%V/%Y')
    
    
    def test2(d):
        return d['date'].dt.isocalendar().week.astype(str) + '/' +  d['date'].dt.year.astype(str)
    
    
    def test3(d):
        return d['date'].dt.isocalendar().week.astype(str).str.cat(d['date'].dt.year.astype(str), sep='/')
    
    
    def test4(d):
        return [f"{week}/{year}" for week, year in zip(d['date'].dt.isocalendar().week, d['date'].dt.year)]
    
    
    def test5(d):
        return d['date'].map(lambda x: "%d/%d" % (x.isocalendar()[1], x.year))
    
    t1 = %timeit -r2 -n1 -q -o test1(df)
    t2 = %timeit -r2 -n1 -q -o test2(df)
    t3 = %timeit -r2 -n1 -q -o test3(df)
    t4 = %timeit -r2 -n1 -q -o test4(df)
    t5 = %timeit -r2 -n1 -q -o test5(df)
    
    print(f'test1 result: {t1}')
    print(f'test2 result: {t2}')
    print(f'test3 result: {t3}')
    print(f'test4 result: {t4}')
    print(f'test5 result: {t5}')
    
    test1 result: 3min 45s ± 653 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
    test2 result: 53.4 s ± 459 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
    test3 result: 59.7 s ± 164 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
    test4 result: 35.5 s ± 409 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
    test5 result: 2min 2s ± 29.1 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum