I have tags (only ASCII chars inside brackets) of the following structure: [Root.GetSomething], instead, some contributors ended up submitting contributions with Cyrillic chars that look similar to Latin ones, e.g. [Rооt.GеtSоmеthіng].
I need to locate, and then replace those inconsistencies with the matching ASCII characters inside the brackets.
I tried \[([АаІіВСсЕеРТтОоКкХхМ] )\]; (\[)([^\x00-\x7F] )(\]), and some variations of the range but those searches don't see any matches. I seem to be missing something important in the regex execution logic.
CodePudding user response:
You can use a regex matching any "interesting" Cyrillic char in between [ letters or . ] and a conditional replacement pattern:
Find What: (?:\G(?!\A)|\[)[a-zA-Z.]*\K(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М))(?=[[:alpha:].]*])
Replace With: (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M)
Make sure Match Case option is ON. See a
string:
Details:
(?:\G(?!\A)|\[)- end of the previous successful match or a[char[a-zA-Z.]*- zero or more.or ASCII letters\K- match reset operator that discards the currently matched text from the overall match memory buffer(?:(А)|(а)|(І)|(і)|(В)|(С)|(с)|(Е)|(е)|(Р)|(Т)|(т)|(О)|(о)|(К)|(к)|(Х)|(х)|(М))- a non-capturing group containing 19 alternatives each of which is put into a separate capturing group(?=[[:alpha:].]*])- a positive lookahead that requires zero or more letters or.and then a]char immediately to the right of the current location.
The (?1A:?2a:?3I:?4i:?5B:?6C:?7c:?8E:?9e:?{10}P:?{11}T:?{12}t:?{13}O:?{14}o:?{15}K:?{16}k:?{17}X:?{18}x:?{19}M) replacement pattern replaces А with A (\u0410) if Group 1 matched, а (\u0430) with a if Group 2 matched, etc.

