Given dataframe I want to set isActive column value to True only duplicated value and add '_duplicate' to the Name column.
df =
Name isActive LoginDate
John False 2021
John False 2022
Fred False 2020
Desired output is:
df =
Name isActive LoginDate
John_duplicate True 2021
John False 2022
Fred False 2020
For now I was able to add numbers to each duplicates, but I want to skip with nearest login date and add text to oldest. And change boolean value:
df.LoginDate = ad.groupby('LoginDate').LoginDate.apply(lambda n: n (np.arange(len(n)) 1).astype(str))
Any suggestion?
CodePudding user response:
Use Series.duplicated for first value per Name with chaining duplicated with keep=False for first duplicated Name and set column isActive with append substring to Name:
m = ~df['Name'].duplicated() & df['Name'].duplicated(keep=False)
df['isActive'] = m
df.loc[m, 'Name'] = '_duplicate'
print (df)
Name isActive LoginDate
0 John_duplicate True 2021
1 John False 2022
2 Fred False 2020
