I have this example dataset
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 699
Intel i5 8 15.6 1920x1080 569
Intel i5 8 15.6 1920x1080 789
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
All I want to do is, check and then drop the duplicate data, except in price column, and then keep the lowest value in price column.
So, the output column is like this :
CPU_Sub_Series RAM Screen_Size Resolution Price
Intel i5 8 15.6 1920x1080 569
Ryzen 5 16 16.0 2560x1600 999
Ryzen 5 32 16.0 2560x1600 1299
Should I sorting it first by price ? and then what ?
df.sort_values('Price') ? and then what ?
CodePudding user response:
df.groupby(["CPU_Sub_Series","RAM","Screen_Size","Resolution"], as_index=False).min()
CodePudding user response:
df.drop_duplicates(subset=['CPU_Sub_Series','RAM','Screen_Size','Resolution'],keep='first')
