I have a df as such:
data= [{'Employees at store': 18, 'store': 'Mikes&Carls P@rlor', 'hair cut inch':'$3'},
{'Employees at store': 5, 'store': 'Over-Top', 'hair cut inch': '$9'}]
df = pd.DataFrame(data)
df
& have
df=df.apply(lambda x: x.astype(str).str.lower().str.replace(' ','_')
if isinstance(x, object)
else x)
working for repalacing spaces with underscores. I know that you can link these per How to replace multiple substrings of a string? .
And I also know that the link the exact string, not a subpart of it having tried:
df=df.apply(lambda x: x.astype(str).str.lower().str.
replace(' ','_').str.
replace('&','and').str.
replace('@','a') if isinstance(x, object) else x)
I think I have to use re.sub and do something like this re.sub('[^a-zA-Z0-9_ \n\.]', '', my_str)
and can't figure out how to build it into my apply(lambda...) function.
CodePudding user response:
You can pass a callable to str.replace. Use a dictionary with the list of replacements and use the get method:
maps = {' ': '_', '&': 'and', '@': 'a'}
df['store'].str.replace('[ &@]', lambda m: maps.get(m.group(), ''), regex=True)
output:
0 MikesandCarls_Parlor
1 Over-Top
Name: store, dtype: object
applying on all (string) columns
cols = df.select_dtypes('object').columns
maps = {' ': '_', '&': 'and', '@': 'a', '$': '€'}
df[cols] = df[cols].apply(lambda col: col.str.replace('[ &@$]', lambda m: maps.get(m.group(), ''), regex=True))
output:
Employees at store store hair cut inch
0 18 MikesandCarls_Parlor €3
1 5 Over-Top €9
replacement per column
cols = df.select_dtypes('object').columns
maps = {'store': {' ': '_', '&': 'and', '@': 'a', '$': '€'},
'hair cut inch': {'$': '€'}
}
df[cols] = df[cols].apply(lambda col: col.str.replace('[ &@$]',
lambda m: maps.get(col, {}).get(m.group(), ''),
regex=True))
