How can I fill NaN in Pandas Dataframe with median of each category?-CodePudding

I have a Pandas Dataframe like this

Age	Gender	City
10000	Male	Tokyo
15000	Male	Tokyo
20000	Male	Tokyo
12000	Female	Madrid
14000	Female	Madrid
16000	Female	Madrid
15000	Female	Rome
NaN	Female	Rome
NaN	Male	Tokyo
NaN	Female	Rome

Those 3 last rows I'd like to input the median based on the gender and city. For example, for the Female in Rome that has NaN value, it would be 15000 because of the only one female of Rome that has 15000.

For the male with Nan values and from Tokyo, it would be 15000 because it is the median of the male of Tokyo.

I know I can fill with the median of the column df['Age'] = df['Age'].fillna(median), but I want to calculate it using the other categorial columns too.

Maybe something like this?

df['Age'] = df['Age].finnla(df[['Age','Gender','City']].groupby(by=['Gender','City']).median())

How can I do this?

Appreciate ur help

CodePudding user response：

Let us try groupby with transform, also please make sure it is NaN not 'Nan'

df.Age.fillna(df.groupby(['Gender','City'])['Age'].transform('median'),inplace=True)
df
Out[628]: 
       Age  Gender    City
0  10000.0    Male   Tokyo
1  15000.0    Male   Tokyo
2  20000.0    Male   Tokyo
3  12000.0  Female  Madrid
4  14000.0  Female  Madrid
5  16000.0  Female  Madrid
6  15000.0  Female    Rome
7  15000.0  Female    Rome
8  15000.0    Male   Tokyo
9  15000.0  Female    Rome

CodePudding user response：

Another way:

df['Age'] = df.groupby(['Gender', 'City'])['Age'].apply(lambda x: x.fillna(x.median()))
print(df)

# Output
       Age  Gender    City
0  10000.0    Male   Tokyo
1  15000.0    Male   Tokyo
2  20000.0    Male   Tokyo
3  12000.0  Female  Madrid
4  14000.0  Female  Madrid
5  16000.0  Female  Madrid
6  15000.0  Female    Rome
7  15000.0  Female    Rome
8  15000.0    Male   Tokyo
9  15000.0  Female    Rome