I can't find a solution online and I know this should be easy but I can't figure out what is wrong with my regex:
here is my code:
df = pd.DataFrame({'Company phone number': [' 1-541-296-2271', ' 1-542-296-2271', ' 1-543-296-2271'],
'Contact phone number': ['15112962271', None,'15312962271'],
'num_specimen_seen': [10, 2,3]},
index=['falcon', 'dog','cat'])
df['Contact phone number'] = df['Contact phone number'].str.replace('^\d{11}$', r'\ 1-\d{3}-\d{3}-\d{4}')
desired output of df['Contact phone number']:
falcon 1-511-296-2271
dog None
cat 1-531-296-2271
It is always 11 digits with no spaces or special characters. Thanks!
CodePudding user response:
You can use
df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d )$', r' 1-\1-\2-\3-\4', regex=True)
Details:
^- a start of string(\d)- Group 1 (\1): a digit(\d{3})- Group 2 (\2): three digits(\d{3})- Group 3 (\3): three digits(\d )- Group 4 (\4): any one or more digits (use\d{4}if you need to match exactly four next digits)$- end of string.
Output:
>>> df['Contact phone number']
falcon 1-1-511-296-2271
dog None
cat 1-1-531-296-2271
See the regex demo.
CodePudding user response:
You can use .str.extract, convert each row of results to a list, and then use .str.join (and of course concatenate a at the beginning):
df['Contact phone number'] = ' ' df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')
Output:
>>> df
Company phone number Contact phone number num_specimen_seen
falcon 1-541-296-2271 1-511-296-227 10
dog 1-542-296-2271 NaN 2
cat 1-543-296-2271 1-531-296-227 3
