How to print one variable values togehter in the loop?
I have a dataframe with some columns, then I loop though based on classification value, like this:
import pandas as pd
from sklearn.metrics import mean_squared_error
import numpy as np
data = pd.read_csv(r'C:\\example..csv')
classif = list(set(data['clasification']))
# print(classif)
# [1, 2, 3, 4]
for classif in classif:
df_r = data[data['clasification'] == classif]
Y = df_r['A']
X = df_r[['B', 'C']]
# i do some regression here
print(np.mean(Y))
print(np.max(Y))
When printing I get all values sequently based on classification value. So 14.580000000000002 and 43.6 is mean and max for rows which classification value is 1; 17.490909090909092 and 45.3 is mean and max for rows which classification value is 2 and so on, like this:
14.580000000000002
43.6
17.490909090909092
45.3
29.599999999999998
67.9
14.766666666666666
29.3
But is is possible to print values in loop not for each classification group together? The results would look something like this:
print(np.mean(Y))
print(np.max(Y))
14.580000000000002
17.490909090909092
29.599999999999998
14.766666666666666
43.6
45.3
67.9
29.3
Here is the example of dataframe used in the example:
Out[281]:
id clasification A B C
0 1 1 5.4 7.4 59.6
1 2 2 44.2 49.9 244.0
2 3 3 5.5 8.8 42.4
3 4 1 10.5 14.9 82.6
4 5 1 13.6 19.8 93.7
5 6 1 12.9 18.2 103.4
6 7 1 7.4 10.5 50.9
7 8 2 7.4 10.9 54.2
8 9 2 8.2 11.7 55.8
9 10 2 10.0 13.5 55.8
10 11 2 6.0 8.2 29.3
11 12 2 45.3 63.9 392.7
12 13 2 9.5 9.4 53.7
13 14 2 23.9 32.9 226.6
14 15 3 46.7 63.9 406.2
15 16 3 7.8 8.6 44.4
16 17 3 35.8 49.9 343.6
17 18 3 67.9 87.5 609.9
18 19 2 14.8 20.6 120.3
CodePudding user response:
That is not possible without changing the structure of your code since without removing those prints inside the loop, each iteration will print sequentially a mean and a maximum. A suitable way to do as you'd like may be to save those values in lists and the print them at the end as follows
import pandas as pd
from sklearn.metrics import mean_squared_error
import numpy as np
data = pd.read_csv(r'C:\\example..csv')
classif = list(set(data['clasification']))
# print(classif)
# [1, 2, 3, 4]
means = []
maxes = []
for classif in classif:
df_r = data[data['clasification'] == classif]
Y = df_r['A']
X = df_r[['B', 'C']]
# i do some regression here
means.append(np.mean(Y))
maxes.append(np.max(Y))
print(means)
print(maxes)
CodePudding user response:
sounds like you are trying to do a groupby and some statistics. You can try
df.groupby('clasification').agg([mean,max])
prints
id A B C
mean max mean max mean max mean max
clasification
1 4.600000 7 9.960000 13.6 14.160000 19.8 78.040000 103.4
2 10.888889 19 18.811111 45.3 24.555556 63.9 136.933333 392.7
3 13.800000 18 32.740000 67.9 43.740000 87.5 289.300000 609.9
agg function gives you quite a lot of control over what statistics to priont for what columns
you can even try
df.groupby('clasification').apply(lambda g:g.describe())
that will give you a bunch of aggregated stats per group
