I have a df like this:
df = pd.DataFrame({'a': ['2019-09-01 17:00:00', '2019-09-01 17:15:00','2019-09-01 17:30:00','2019-09-01 17:45:00','2019-09-01 18:00:00', '2019-09-01 18:15:00','2019-09-01 18:30:00','2019-09-01 18:45:00'],
'b': [432.6, 427.56, 424.2, 433.44,450.24,447.72,452.76,453.6]})
And I want to create a loop to calculate the mean of the values for every 4 items like this: When i = 0 (first position)
mean0 = df.loc[0:3,'b'].mean()
When i = 1:
mean1 = df.loc[4:7,'b'].mean()
And so on.I've tried to create something like this:
for i in df['b]:
mean[i] = (df[i,'b'] df.loc[(i 1),'b'] df.loc[(i 2),'b']) df.loc[(i 3),'b'])).mean()
But i always get a error message KeyError: 655.7077670000001 or Nan values. Thanks for the help.
CodePudding user response:
This solution is efficient as it is vectorized
mean=list(df.groupby(df.index//4)['b'].mean())
And if you want to continue doing your own method for exploring here is the code
n=df.shape[0]//4
mean=[0]*n
for i in range(n):
mean[i] = (df.loc[i*4,'b'] df.loc[(i*4 1),'b'] df.loc[(i*4 2),'b'] df.loc[(i*4 3),'b'])/4
Output:
[429.45000000000005, 451.08000000000004]
your code was giving you error because .loc was missing here df[i,'b']=>df.loc[i*4,'b']
CodePudding user response:
Try this:
>>> df.groupby(df.index // 4).mean()
b
0 429.45
1 451.08
Or maybe
>>> df['mean'] = df.groupby(df.index // 4)['b'].transform('mean')
>>> df
a b mean
0 2019-09-01 17:00:00 432.60 429.45
1 2019-09-01 17:15:00 427.56 429.45
2 2019-09-01 17:30:00 424.20 429.45
3 2019-09-01 17:45:00 433.44 429.45
4 2019-09-01 18:00:00 450.24 451.08
5 2019-09-01 18:15:00 447.72 451.08
6 2019-09-01 18:30:00 452.76 451.08
7 2019-09-01 18:45:00 453.60 451.08
CodePudding user response:
Maybe you want to group every 4 values because you want to resample your dataframe.
Try:
out = df.groupby(pd.to_datetime(df['a']).dt.floor('H')).mean().reset_index()
print(out)
# Output
a b
0 2019-09-01 17:00:00 429.45
1 2019-09-01 18:00:00 451.08
