I have a dataframe that looks something like this:
| Group | UPC | Description |
|---|---|---|
| 246 | 1234568 | Chips BBQ |
| 158 | 7532168 | Cereal Honey |
| 246 | 9876532 | Chips Ketchup |
| 665 | 8523687 | Strawberry Jam |
| 246 | 1234568 | Chips BBQ |
| 158 | 5553215 | Cereal Chocolate |
I want to replace the descriptions of the items with the most frequent description based on the group # or the first instance if there is a tie.
So in the example above: Chips Ketchup (1 instance) is replaced with Chips BBQ (2 instances) And Cereal Chocolate is replaced with Cereal Honey (First Instance).
Desired output would be:
| Group | UPC | Description |
|---|---|---|
| 246 | 1234568 | Chips BBQ |
| 158 | 7532168 | Cereal Honey |
| 246 | 9876532 | Chips BBQ |
| 665 | 8523687 | Strawberry Jam |
| 246 | 1234568 | Chips BBQ |
| 158 | 5553215 | Cereal Honey |
If this is too complicated I can settle for replacing with simply the first instance without taking frequency into consideration at all.
Thanks in advance
CodePudding user response:
You can use
df['Description'] = df.groupby('Group')['Description'].transform(lambda s: s.value_counts().index[0])
It seems like Series.value_counts (unlike Series.mode, which I also tried) orders elements that occur the same number of times by their first occurence. This behavior is not documented so I'm not sure you can rely on it.
