Construct a pandas dataframe with columns equal to the mean of the each columns of various pandas da-CodePudding

I have 21 pandas data frames, all of them looking like this:

|country |   reference   | price_1|price_2 |price 3| price_4 |
|------------------------------------------------------------
|A       |1              | 0      |   5    | 13    | 4       |
|A       |2              | 3      |   8    | 4     | 5       |
|A       |3              | 0      |   4    | 6     | 9       |
|A       |4              | 1      |  12    | 11    | 12      |  
|A       |5              |45      |   9    | 3     | 13      |
|A       |6              |3       |   6    | 5     | 16      |
|A       |7              |34      |   7    | 2     | 17      |

Here I use A as the name of the country, and the dataframe will be name dataA. I want to get the mean of each column, price_1, price_2 and price_3, so I did this

datosA1 = dataA.iloc[:,2:5]
A_mean_loss = dataA1.mean(axis=0)

So, I get the dataframe A_mean_loss:

|price_1|12.28|
|price_2|7.3  |
|price_3|6.29 |
|price_4|10.85|

What I need is to do the same in a efficiently way for the 21 panda dataframes and to get a new dataframe that looks like this:

|price id| A   |B   |C   |D   |E   |.......|(Country21)|
---------------------------------------------------------
|1       |12.28|x   |x   |x   |x   |.......|x          |
|2       |7.3  |x   |x   |x   |x   |.......|x          |          
|3       |6.29 |x   |x   |x   |x   |.......|x          |
|4       |10.85|x   |x   |x   |x   |x      |x          |

I'm actually really rookie in Python and programing, but I think that this could be solved generalizing what I did with a function with a for loop over the list of data frames data_countries=[dataA,dataB,dataC,dataD,...,dataU]

I would really apreciate any help!

CodePudding user response：

If you use groupby to do the summing, you can just concatenate the resulting series:

import pandas as pd
import numpy as np

data = [
('A',1, 0,   5, 13, 4),
('A',2, 3,   8, 4, 5),
('A',3, 0,   4, 6, 9),
('A',4, 1,  12, 11, 12),  
('A',5,45,   9, 3, 13),
('A',6,3,   6, 5, 16),
('A',7,34,   7, 2, 17)
]

data = pd.DataFrame( data, columns=['country','reference','price_1','price_2','price_3','price_4'])
print(data)
a1 = data.groupby(['country']).mean()
a2 = data.groupby(['country']).mean()
a3 = data.groupby(['country']).mean()

df4 = pd.concat([a1,a2,a3])
print(df4)

Output:

  country  reference  price_1  price_2  price_3  price_4
0       A          1        0        5       13        4
1       A          2        3        8        4        5
2       A          3        0        4        6        9
3       A          4        1       12       11       12
4       A          5       45        9        3       13
5       A          6        3        6        5       16
6       A          7       34        7        2       17
         reference    price_1   price_2   price_3    price_4
country                                                     
A              4.0  12.285714  7.285714  6.285714  10.857143
A              4.0  12.285714  7.285714  6.285714  10.857143
A              4.0  12.285714  7.285714  6.285714  10.857143

You would want to drop the useless reference column here.

CodePudding user response：

I don't have datasets to test it but i think it work.

import os
import pandas as pd

def country_mean_loss(dir):
    #not sure if your files are csv files.
    data = pd.read_csv(dir)
    country = data.columns[0]
    data.drop(columns=['country','reference'], inplace=True)
    mean_loss = data.mean(axis=0).values
    return country, mean_loss

mean_loss_df = pd.DataFrame()

path = 'your folder path'
#this will list all the files inside your folder.
files = os.listdir(path)

for f in files:
    country, mean_loss = country_mean_loss(path f)
    mean_loss_df[country] = mean_loss