Home > Net >  Aggregate Timestamp fields according column present in dataframe using pandas
Aggregate Timestamp fields according column present in dataframe using pandas

Time:01-24

I am trying to aggregate the timestamp field to 1 hour interval with another column that contains multiple fields.

Dataset-

df=pd.read_excel("<....>.xlsx")
df.head(10)
    timestamp                   device              value
1   2021-01-31 01:02:33 00:00   transPRD-client2    0.27
2   2021-01-31 01:04:34 00:00   transPRD-data3      2.41
5   2021-01-31 01:07:38 00:00   transPRD-client2    0.31
6   2021-01-31 01:09:39 00:00   transPRD-data3      2.20
9   2021-01-31 01:12:38 00:00   transPRD-client2    0.33
10  2021-01-31 01:14:39 00:00   transPRD-data3      1.70
13  2021-01-31 01:17:38 00:00   transPRD-client2    0.23
14  2021-01-31 01:19:39 00:00   transPRD-data3      1.89
17  2021-01-31 01:22:41 00:00   transPRD-client2    0.18
18  2021-01-31 01:24:41 00:00   transPRD-data3      1.62
21  2021-01-31 01:27:41 00:00   transPRD-client     0.22

I tried the below code, the timestamp is getting aggregated with 5 mins interval but the only value is present in the "device" column. Data is not getting aggregated on another device.

df1=df.set_index('timestamp').resample("1h").max().reset_index()
df1
timestamp                        device         value
0   2021-01-31 01:00:00 00:00   transPRD-data3  2.90
1   2021-01-31 02:00:00 00:00   transPRD-data3  2.83
2   2021-01-31 03:00:00 00:00   transPRD-data3  3.64
3   2021-01-31 04:00:00 00:00   transPRD-data3  2.06
4   2021-01-31 05:00:00 00:00   transPRD-data3  1.84

CodePudding user response:

You can add groupby for aggregate with resample:

df1=df.set_index('timestamp').groupby('device')['value'].resample("1h").max().reset_index()
print (df1)
             device                 timestamp  value
0   transPRD-client 2021-01-31 01:00:00 00:00   0.22
1  transPRD-client2 2021-01-31 01:00:00 00:00   0.33
2    transPRD-data3 2021-01-31 01:00:00 00:00   2.41
  •  Tags:  
  • Related