Home > Enterprise >  drop duplicates and exclude specific columns and take the lowest value
drop duplicates and exclude specific columns and take the lowest value

Time:01-25

I have this example dataset

CPU_Sub_Series  RAM     Screen_Size   Resolution   Price
Intel i5         8      15.6          1920x1080    699
Intel i5         8      15.6          1920x1080    569
Intel i5         8      15.6          1920x1080    789
Ryzen 5          16     16.0          2560x1600    999
Ryzen 5          32     16.0          2560x1600    1299

All I want to do is, check and then drop the duplicate data, except in price column, and then keep the lowest value in price column.
So, the output column is like this :

CPU_Sub_Series  RAM     Screen_Size   Resolution   Price
Intel i5         8      15.6          1920x1080    569
Ryzen 5          16     16.0          2560x1600    999
Ryzen 5          32     16.0          2560x1600    1299

Should I sorting it first by price ? and then what ?
df.sort_values('Price') ? and then what ?

CodePudding user response:

df.groupby(["CPU_Sub_Series","RAM","Screen_Size","Resolution"], as_index=False).min()

CodePudding user response:

df.drop_duplicates(subset=['CPU_Sub_Series','RAM','Screen_Size','Resolution'],keep='first')
  •  Tags:  
  • Related