python/pandas - substring replacement using regular expression-CodePudding

I have inherited an old code file that has following code. It seems the last line of the code below is removing all the open ( and close ) parentheses, and - character from the phone number field. Question: But why it is using regex='\(' in .replace(regex='\(',value='') part of that last line. Some other online examples (such as here and here) I have seen don't seem to use regex keyword in their replacement function. What regex='\(' is doing in the replace function here?

import sqlalchemy as sq
import pandas as pd
import re

pw = dbutils.secrets.get(scope='SomeScope',key='sql')
engine = sq.create_engine('mssql pymssql://SERVICE.Databricks.NONPUBLICETL:' pw 'MyAzureSQL.database.windows.net:1433/TEST', isolation_level="AUTOCOMMIT")

 pandas_df = pd.read_sql('select * from SQLTable1', con=engine)

 pandas_df['MOBILE_PHONE'].replace(regex='\(',value='').replace(regex='\)',value='').replace(regex='\-',value='').str.strip()

CodePudding user response：

You can use the regex keyword as a boolean to tell .replace() whether to interpret the string as regex or as the regular expression itself

CodePudding user response：

The signature of the replace function has changed.

Replace your last line by:

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].replace('[()-]', '', regex=True).str.strip()
print(df)

# Output
     MOBILE_PHONE MOBILE_PHONE2
0  (425) 555-1234   425 5551234

Replace ( or ) or - by ''

CodePudding user response：

Coding precision depends on experience, logic and mastery of syntax. Its like mastery of normal language. The answer you adapted achieves exactly what the code below does

df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].str.replace('[^\d]','',regex=True)

Explanation

\d is regex for digits

[^] is regex for everything except

[^\d] everything except digits

So, using the pandas API, I replace everything except digits in the string with nothing

Outcome

    MOBILE_PHONE  MOBILE_PHONE2
0  (425) 555-1234    4255551234