According to:
I want to use pandas to rename Hospital when a row with the same value in the Hospital column has a different value in the GeneralRepresentation column. And when a row with the same value in the Hospital column has the same value in the GeneralRepresentation column, no renaming is done for Hospital. And for hospitals without GeneralRepresentation, keep the name of the hospital the same.
The effect I want is shown below:
But what I want is for the name of the hospital to remain the same when a hospital does not have a GeneralRepresentation, the effect is like the second picture, how do I modify this code to fulfil my requirement?
CodePudding user response:
Problem is with missing values, for misisng values is factorize set to -1, so if add 1 get 0 for last 2 rows, in my solution is replaced NaN to empty strings before groupby for prevent it:
g = df.fillna({'GeneralRepresentation':''}).groupby('Hospital')['GeneralRepresentation']
s1 = g.transform(lambda x :x.factorize()[0] 1).astype(str)
s2 = g.transform('nunique')
df['Hospital'] = np.where(s2==1, df['Hospital'], df['Hospital'] '_' s1)
print (df)
Hospital GeneralRepresentation
0 a a
1 b_1 b
2 b_2 c
3 c_1 d
4 c_2 e
5 d NaN
6 t NaN
CodePudding user response:
Use np.select(listof conditions, list of choices, alternative)
a=~(df['GeneralRepresentation'].str.contains('\w'))
b= ((df['GeneralRepresentation'].str.contains('\w'))&(df['Hospital'].duplicated(keep=False))&(df['GeneralRepresentation'].duplicated(keep=False)))
df['Hospital'] np.select([a,b],[df['Hospital'] '_' (df.groupby('Hospital').cumcount() 1).astype(str),''],df['Hospital'])



