Print one variable values together in the loop with Python-CodePudding

How to print one variable values togehter in the loop?

I have a dataframe with some columns, then I loop though based on classification value, like this:

import pandas as pd
from sklearn.metrics import mean_squared_error
import numpy as np

data = pd.read_csv(r'C:\\example..csv')

classif = list(set(data['clasification']))
# print(classif)
# [1, 2, 3, 4]

for classif in classif:
    df_r = data[data['clasification'] == classif]

    Y = df_r['A']
    X = df_r[['B', 'C']]

    # i do some regression here

    print(np.mean(Y))
    print(np.max(Y))

When printing I get all values sequently based on classification value. So 14.580000000000002 and 43.6 is mean and max for rows which classification value is 1; 17.490909090909092 and 45.3 is mean and max for rows which classification value is 2 and so on, like this:

14.580000000000002
43.6
17.490909090909092
45.3
29.599999999999998
67.9
14.766666666666666
29.3

But is is possible to print values in loop not for each classification group together? The results would look something like this:

print(np.mean(Y))
print(np.max(Y))
14.580000000000002
17.490909090909092
29.599999999999998
14.766666666666666
43.6
45.3
67.9
29.3

Here is the example of dataframe used in the example:

Out[281]: 
    id  clasification     A     B      C
0    1              1   5.4   7.4   59.6
1    2              2  44.2  49.9  244.0
2    3              3   5.5   8.8   42.4
3    4              1  10.5  14.9   82.6
4    5              1  13.6  19.8   93.7
5    6              1  12.9  18.2  103.4
6    7              1   7.4  10.5   50.9
7    8              2   7.4  10.9   54.2
8    9              2   8.2  11.7   55.8
9   10              2  10.0  13.5   55.8
10  11              2   6.0   8.2   29.3
11  12              2  45.3  63.9  392.7
12  13              2   9.5   9.4   53.7
13  14              2  23.9  32.9  226.6
14  15              3  46.7  63.9  406.2
15  16              3   7.8   8.6   44.4
16  17              3  35.8  49.9  343.6
17  18              3  67.9  87.5  609.9
18  19              2  14.8  20.6  120.3

CodePudding user response：

That is not possible without changing the structure of your code since without removing those prints inside the loop, each iteration will print sequentially a mean and a maximum. A suitable way to do as you'd like may be to save those values in lists and the print them at the end as follows

import pandas as pd
from sklearn.metrics import mean_squared_error
import numpy as np

data = pd.read_csv(r'C:\\example..csv')

classif = list(set(data['clasification']))
# print(classif)
# [1, 2, 3, 4]
means = []
maxes = []
for classif in classif:
    df_r = data[data['clasification'] == classif]

    Y = df_r['A']
    X = df_r[['B', 'C']]

    # i do some regression here

    means.append(np.mean(Y))
    maxes.append(np.max(Y))

print(means)
print(maxes)

CodePudding user response：

sounds like you are trying to do a groupby and some statistics. You can try

df.groupby('clasification').agg([mean,max])

prints

    id              A                   B           C
    mean        max mean         max    mean    max mean    max
clasification                               
1   4.600000    7   9.960000    13.6    14.160000   19.8    78.040000   103.4
2   10.888889   19  18.811111   45.3    24.555556   63.9    136.933333  392.7
3   13.800000   18  32.740000   67.9    43.740000   87.5    289.300000  609.9

agg function gives you quite a lot of control over what statistics to priont for what columns

you can even try

df.groupby('clasification').apply(lambda g:g.describe())

that will give you a bunch of aggregated stats per group