I am trying to aggregate the timestamp field to 1 hour interval with another column that contains multiple fields.
Dataset-
df=pd.read_excel("<....>.xlsx")
df.head(10)
timestamp device value
1 2021-01-31 01:02:33 00:00 transPRD-client2 0.27
2 2021-01-31 01:04:34 00:00 transPRD-data3 2.41
5 2021-01-31 01:07:38 00:00 transPRD-client2 0.31
6 2021-01-31 01:09:39 00:00 transPRD-data3 2.20
9 2021-01-31 01:12:38 00:00 transPRD-client2 0.33
10 2021-01-31 01:14:39 00:00 transPRD-data3 1.70
13 2021-01-31 01:17:38 00:00 transPRD-client2 0.23
14 2021-01-31 01:19:39 00:00 transPRD-data3 1.89
17 2021-01-31 01:22:41 00:00 transPRD-client2 0.18
18 2021-01-31 01:24:41 00:00 transPRD-data3 1.62
21 2021-01-31 01:27:41 00:00 transPRD-client 0.22
I tried the below code, the timestamp is getting aggregated with 5 mins interval but the only value is present in the "device" column. Data is not getting aggregated on another device.
df1=df.set_index('timestamp').resample("1h").max().reset_index()
df1
timestamp device value
0 2021-01-31 01:00:00 00:00 transPRD-data3 2.90
1 2021-01-31 02:00:00 00:00 transPRD-data3 2.83
2 2021-01-31 03:00:00 00:00 transPRD-data3 3.64
3 2021-01-31 04:00:00 00:00 transPRD-data3 2.06
4 2021-01-31 05:00:00 00:00 transPRD-data3 1.84
CodePudding user response:
You can add groupby for aggregate with resample:
df1=df.set_index('timestamp').groupby('device')['value'].resample("1h").max().reset_index()
print (df1)
device timestamp value
0 transPRD-client 2021-01-31 01:00:00 00:00 0.22
1 transPRD-client2 2021-01-31 01:00:00 00:00 0.33
2 transPRD-data3 2021-01-31 01:00:00 00:00 2.41
