I have inherited an old code file that has following code. It seems the last line of the code below is removing all the open ( and close ) parentheses, and - character from the phone number field. Question: But why it is using regex='\(' in .replace(regex='\(',value='') part of that last line. Some other online examples (such as here and here) I have seen don't seem to use regex keyword in their replacement function. What regex='\(' is doing in the replace function here?
import sqlalchemy as sq
import pandas as pd
import re
pw = dbutils.secrets.get(scope='SomeScope',key='sql')
engine = sq.create_engine('mssql pymssql://SERVICE.Databricks.NONPUBLICETL:' pw 'MyAzureSQL.database.windows.net:1433/TEST', isolation_level="AUTOCOMMIT")
pandas_df = pd.read_sql('select * from SQLTable1', con=engine)
pandas_df['MOBILE_PHONE'].replace(regex='\(',value='').replace(regex='\)',value='').replace(regex='\-',value='').str.strip()
CodePudding user response:
You can use the regex keyword as a boolean to tell .replace() whether to interpret the string as regex or as the regular expression itself
CodePudding user response:
The signature of the replace function has changed.
Replace your last line by:
df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].replace('[()-]', '', regex=True).str.strip()
print(df)
# Output
MOBILE_PHONE MOBILE_PHONE2
0 (425) 555-1234 425 5551234
Replace ( or ) or - by ''
CodePudding user response:
Coding precision depends on experience, logic and mastery of syntax. Its like mastery of normal language. The answer you adapted achieves exactly what the code below does
df['MOBILE_PHONE2'] = df['MOBILE_PHONE'].str.replace('[^\d]','',regex=True)
Explanation
\d is regex for digits
[^] is regex for everything except
[^\d] everything except digits
So, using the pandas API, I replace everything except digits in the string with nothing
Outcome
MOBILE_PHONE MOBILE_PHONE2
0 (425) 555-1234 4255551234
