Home > Software design >  How to strip multiple characters in dataframe at once
How to strip multiple characters in dataframe at once

Time:01-15

My data has some foreign characters and some other unicode characters and I'm trying to get rid of them to clean the data. For example, the current string values look like the Before column and the results should be look like the After column.

    Before                 After
Students Num #          Student Num
无差异()\nLocation       Location
/\nCity                  City
异\nPercent              Percent

I've tried the following code and of course it only eliminates "\n".

df['After'] = df['Before'].str.replace(r'[^\x00-\x7F] ', '').str.strip('\n')

I tried to add other strings like '()\n' in the str.strip argument but it didn't work. How do I modify my code to get rid of all the weird unicode strings?

Thanks.

CodePudding user response:

You might find that just stripping off non alphanumeric characters achieves what you want:

df['After'] = df['Before'].str.replace(r'[^A-Za-z0-9\s\\] ', '').str.strip()
  •  Tags:  
  • Related