How to detect word boundary in regex for Arabic words

I am trying to remove any word that might contain non-Arabic characters. So, words like ذهb or word should be removed.

I have managed to remove the non-Arabic characters using the below regex:

re.sub(r'([^،-٩] )',' ', 'ذهb')

But how would I remove the whole word? Preceding the regex with \b doesn't seem to work.

CodePudding user response：

You might want to try ascii_letters. This should work.

import string

text = "".join([char for char in text if char not in string.ascii_letters]).strip()
return text

CodePudding user response：

You can use

re.sub(r'\s*\b[\u0621-\u064A]*[^\W\d_\u0621-\u064A][^\W\d_]*\b', '', text)

The \s*\b[\u0621-\u064A]*[^\W\d_\u0621-\u064A][^\W\d_]*\b matches