I have a list of columns whose values are all strings. I need to one hot encode them with pd.get_dummies().
I want to keep the original name of those columns along with the value.
So lets say I have a column named Street, and its values are Paved and Not Paved.
After running get_dummies(), I would like the 2 resulting columns to be entitled Street_Paved and Street_Not_Paved. Is this possible? Basically the format for the prefix parameter is {i}_{value}, with i referring to the for i in cols common nomenclature.
My code is:
cols = ['Street', 'Alley', 'CentralAir', 'Utilities', 'LandSlope', 'PoolQC']
pd.get_dummies(df, columns = cols, prefix = '', prefix_sep = '')
CodePudding user response:
If remove prefix = '', prefix_sep = '' parameters get default prefix from columns names with default separator _:
df = pd.DataFrame({'Street' : ['Paved','Paved','Not Paved','Not Paved'],
'Alley':list('acca')})
cols = ['Street','Alley']
df = pd.get_dummies(df, columns = cols)
print (df)
Street_Not Paved Street_Paved Alley_a Alley_c
0 0 1 1 0
1 0 1 0 1
2 1 0 0 1
3 1 0 1 0
If need replace all spaces by _ add rename:
cols = ['Street','Alley']
df = pd.get_dummies(df, columns = cols).rename(columns=lambda x: x.replace(' ', '_'))
print (df)
Street_Not_Paved Street_Paved Alley_a Alley_c
0 0 1 1 0
1 0 1 0 1
2 1 0 0 1
3 1 0 1 0
