I will filter a column on dataframe for to have only the number (digit code).
| main_column |
|---|
| HKA1774348 |
| null |
| 774970331205 |
| 160-27601033 |
| SGSIN/62/898805 |
| null |
| LOCAL |
| 217-29062806 |
| null |
| 176-07027893 |
| 724-22100374 |
| 297-00371663 |
| 217-11580074 |
I obtain this column
| main_column |
|---|
| 774970331205 |
| 160-27601033 |
| 217-29062806 |
| 176-07027893 |
| 724-22100374 |
| 297-00371663 |
| 217-11580074 |
CodePudding user response:
You can use rlike with an regexp that only includes digits and a hyphen:
df.where(df['main_column'].rlike('^[0-9\-] $')).show()
Output:
------------
| main_column|
------------
|774970331205|
|160-27601033|
|217-29062806|
|176-07027893|
|724-22100374|
|297-00371663|
|217-11580074|
------------
