Pandas: Finding average values of dataframe by hour & month

Assume I have a df:

timestamp             value1     value2
01-01-2010 00:00:00       10          5
30-01-2019 00:00:00        5          1
01-02-2015 12:00:00        1          0
25-02-2007 05:00:00       10         10
01-02-2015 05:00:00       10          1

I would like to plot a time series graph based on mean value of columns 'value1' & 'value2' based only on hour & month of the dataset. Desired df & graph may look something like this:

hour-month     value1   value2
00-01             7.5        3
05-02              10      5.5
12-02               1        0

Time series chart

I'm new to Python; please advise

1 answer

  • answered 2020-10-16 04:41 jezrael

    First convert column to datetimes by to_datetime, then aggregate mean with Series.dt.strftime for convert datetimes to HH-mm strings and last plot by DataFrame.plot:

    df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
    
    df1 = df.groupby(df['timestamp'].dt.strftime('%H-%m')).mean()
    
    print (df1)
               value1  value2
    timestamp                
    00-01         7.5     3.0
    05-02        10.0     5.5
    12-02         1.0     0.0
    
    df1.plot()
    

    EDIT:

    df['timestamp'] = pd.to_datetime(df['timestamp'], dayfirst=True)
    
    df1 = df.groupby(df['timestamp'].map(lambda x: x.replace(year=2020, day=1))).mean()
    
    print (df1)
                         value1  value2
    timestamp                          
    2020-01-01 00:00:00     7.5     3.0
    2020-02-01 05:00:00    10.0     5.5
    2020-02-01 12:00:00     1.0     0.0
    
    df2 = df1.rename_axis('col', axis=1).stack().reset_index(name='vals')
    print (df2)
                timestamp     col  vals
    0 2020-01-01 00:00:00  value1   7.5
    1 2020-01-01 00:00:00  value2   3.0
    2 2020-02-01 05:00:00  value1  10.0
    3 2020-02-01 05:00:00  value2   5.5
    4 2020-02-01 12:00:00  value1   1.0
    5 2020-02-01 12:00:00  value2   0.0
    

    import plotly.express as px
    
    #https://plotly.com/python/line-charts/
    fig = px.line(df2, x="timestamp", y="vals", color='col')
    #https://plotly.com/python/time-series/
    fig.update_xaxes(
        dtick="timestamp",
        tickformat="%H\n%m")
    fig.show()