Home > Net >  Pandas Create a new Column which takes the Most Frequent item Description given Item Codes
Pandas Create a new Column which takes the Most Frequent item Description given Item Codes

Time:02-04

I have a dataframe that looks something like this:

Group UPC Description
246 1234568 Chips BBQ
158 7532168 Cereal Honey
246 9876532 Chips Ketchup
665 8523687 Strawberry Jam
246 1234568 Chips BBQ
158 5553215 Cereal Chocolate

I want to replace the descriptions of the items with the most frequent description based on the group # or the first instance if there is a tie.

So in the example above: Chips Ketchup (1 instance) is replaced with Chips BBQ (2 instances) And Cereal Chocolate is replaced with Cereal Honey (First Instance).

Desired output would be:

Group UPC Description
246 1234568 Chips BBQ
158 7532168 Cereal Honey
246 9876532 Chips BBQ
665 8523687 Strawberry Jam
246 1234568 Chips BBQ
158 5553215 Cereal Honey

If this is too complicated I can settle for replacing with simply the first instance without taking frequency into consideration at all.

Thanks in advance

CodePudding user response:

You can use

df['Description'] = df.groupby('Group')['Description'].transform(lambda s: s.value_counts().index[0])

It seems like Series.value_counts (unlike Series.mode, which I also tried) orders elements that occur the same number of times by their first occurence. This behavior is not documented so I'm not sure you can rely on it.

  •  Tags:  
  • Related