I'm trying to assign 0 to string that is more frequent and 1 to less frequent string in a function. My idea is that it should take any column with binary string and based on value count assign 0 and 1. How can i do that?
data = {'status':["Default", "Non-Default", "Non-Default", "Non-Default", "Default", "Non-Default"]}
df = pd.DataFrame(data)
df
status
0 Default
1 Non-Default
2 Non-Default
3 Non-Default
4 Default
5 Non-Default
df.value_counts()
status
Non-Default 4
Default 2
dtype: int64
CodePudding user response:
You can use:
df['binary'] = df['status'].ne(df['status'].mode().iloc[0]).astype(int)
mode gets the most frequent value, and iloc[0] gets the first one (in case of equality). Then we identify the values that are NOT this string (True) and convert to integer (1). The most frequent string will be 0.
output:
status binary
0 Default 1
1 Non-Default 0
2 Non-Default 0
3 Non-Default 0
4 Default 1
5 Non-Default 0
