Input
Column
0 2 mm
1 3 kg
2 4 m
3 name
4 2 mm
5 3 mph
6 full
7 left
Here I need to remove Units from the table. I tried with
df["Column"] = df["Column"].replace("\D", "", regex = True)
It gives me wrong output.
Expected Output:
Column
0 2
1 3
2 4
3 name
4 2
5 3
6 full
7 left
CodePudding user response:
You can use
df["Column"] = df["Column"].str.replace(r'(\d)\s*[a-zA-Z] $', r'\1', regex=True)
See the regex demo. Regex details:
(\d)- Group 1 (the\1numbered backreference refers to this group value from the replacement pattern): any digit\s*- zero or more whitespaces[a-zA-Z]- one or more ASCII letters$- end of string.
CodePudding user response:
You still can use your replace
s = df.Column.replace('[^0-9] ','',regex=True)
df.Column = df.Column.mask(s!='',s)
Out[27]:
0 2
1 3
2 4
3 name
4 2
5 3
6 full
7 left
Name: Column, dtype: object
CodePudding user response:
You can use str.extract: if the row begins by a number ^\d , get it or | keep the entire row .*.
df['Column'] = df['Column'].str.extract(r'(^\d |.*)')
print(df)
# Output
Column
0 2
1 3
2 4
3 name
4 2
5 3
6 full
7 left
