Using regex sub on a df.column with apply in pandas df-CodePudding

I have a df as such:

data= [{'Employees at store': 18, 'store': 'Mikes&Carls P@rlor', 'hair cut inch':'$3'},
        {'Employees at store': 5, 'store': 'Over-Top', 'hair cut inch': '$9'}]

df = pd.DataFrame(data)
df

& have

df=df.apply(lambda x: x.astype(str).str.lower().str.replace(' ','_')
                      if isinstance(x, object)
                      else x)

working for repalacing spaces with underscores. I know that you can link these per How to replace multiple substrings of a string? .

And I also know that the link the exact string, not a subpart of it having tried:

df=df.apply(lambda x: x.astype(str).str.lower().str.
            replace(' ','_').str.
            replace('&','and').str.
            replace('@','a') if isinstance(x, object) else x)

I think I have to use re.sub and do something like this re.sub('[^a-zA-Z0-9_ \n\.]', '', my_str) and can't figure out how to build it into my apply(lambda...) function.

CodePudding user response：

You can pass a callable to str.replace. Use a dictionary with the list of replacements and use the get method:

maps = {' ': '_', '&': 'and', '@': 'a'}
df['store'].str.replace('[ &@]', lambda m: maps.get(m.group(), ''), regex=True)

output:

0    MikesandCarls_Parlor
1                Over-Top
Name: store, dtype: object

applying on all (string) columns

cols = df.select_dtypes('object').columns

maps = {' ': '_', '&': 'and', '@': 'a', '$': '€'}
df[cols] = df[cols].apply(lambda col: col.str.replace('[ &@$]', lambda m: maps.get(m.group(), ''), regex=True))

output:

   Employees at store                 store hair cut inch
0                  18  MikesandCarls_Parlor            €3
1                   5              Over-Top            €9

replacement per column

cols = df.select_dtypes('object').columns

maps = {'store': {' ': '_', '&': 'and', '@': 'a', '$': '€'},
        'hair cut inch': {'$': '€'}
       }

df[cols] = df[cols].apply(lambda col: col.str.replace('[ &@$]', 
                          lambda m: maps.get(col, {}).get(m.group(), ''),
                          regex=True))