How to groupby().transform() to find mode in dataframe?-CodePudding

I have a dataframe like:

lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]

df = pd.DataFrame(lst, columns =["Class", "Grade"])

I need to get the mode (majority vote) of "Grade" in each "Class". If it's a tie vote, assign "x".

Below is what I expect to get:

Class	Grade	Majority_vote
High	A	A
High	A	A
High	B	A
Medium	A	x
Medium	B	x
Medium	C	x

This is my code:

df['majority_vote'] = df.groupby(['Class'])['Grade'].transform(lambda x: x.mode()[0])

I think the code will return 'nan' if it's a tie vote. Then, I will change 'nan' to 'x' later.

However, what I get is below:

Class	Grade	Majority_vote
High	A	A
High	A	A
High	B	A
Medium	A	A
Medium	B	A
Medium	C	A

At class "Medium", the code returns the 1st element ("A") instead of 'nan'.

Any other method is appreciated. Could you please help me? Thank you in advance.

CodePudding user response：

The issue with using x.mode()[0] is that pd.Series(['A', 'B', 'C']).mode() evaluates to ['A', 'B', 'C']. Meanwhile, pd.Series(['A', 'A', 'B']).mode() evaluates to ['A'].

Here is a function that will return the mode (if there is only one) and "x" if there is a tie (i.e., multiple modes).

import pandas as pd
lst = [["High", "A"], ["High", "A"], ["High", "B"],["Medium", "A"], ["Medium", "B"], ["Medium", "C"]]
df = pd.DataFrame(lst, columns=["Class", "Grade"])

def get_mode_or_x(series):
    mode = series.mode()
    if mode.size == 1:
        return mode[0]
    return "x"

df.loc[:, "majority_vote"] = df.groupby("Class")["Grade"].transform(get_mode_or_x)

index	Class	Grade	majority_vote
0	High	A	A
1	High	A	A
2	High	B	A
3	Medium	A	x
4	Medium	B	x
5	Medium	C	x