I want to use pandas to rename Hospital when a row with the same value in the Hospital column has a different value in the GeneralRepresentation column. And when a row with the same value in the Hospital column has the same value in the GeneralRepresentation column, no renaming is done for Hospital.
The effect I want is shown below:

CodePudding user response:
Just do need to change the logic, you need the groupby cumcount the unique value
g = df.groupby('Hospital')['GeneralRepresentation']
s1 = g.transform(lambda x :x.factorize()[0] 1).astype(str)
s2 = g.transform('nunique')
df['Hospital'] = np.where(s2==1, df['Hospital'], df['Hospital'] '_' s1,)
df
Hospital GeneralRepresentation
0 a a
1 b_1 b
2 b_2 c
3 c_1 d
4 c_2 e
5 d f
6 d f
CodePudding user response:
leverage duplicated to create boolean. Pass the booleans into np.where(condition, if condition true, if condition is false). cumcount will help generate incremental consecutives which when turned into strings can be concatenated to original name
df['Hospital']=np.where(((df['Hospital'].duplicated(keep=False))&(df['GeneralRepresentation'].duplicated(keep=False))),df['Hospital'] '_' (df.groupby('Hospital').cumcount() 1).astype(str),df['Hospital'])
CodePudding user response:
You can use:
dup = ~df.duplicated(keep=False)
g_count = df.groupby("Hospital").cumcount() 1
count = df.groupby("Hospital")['GeneralRepresentation'].transform('count')
df['Hospital'] = np.where((dup) & (count>1), df['Hospital'] '_' g_count.astype(str), df['Hospital'])
OUTPUT
Hospital GeneralRepresentation
0 UMC a
1 MGH_1 b
2 MGH_2 j
3 NMH_1 o
4 NMH_2 a
5 MSH d
6 MSH d

