I'm using regular expression to extract some country data in BigQuery. And I don't know how to extract the text I want from it. This is the example records I use.
| country |
|---|
| China Anhui Univ Chinese Med, Affiliated Hosp 1, Expt Ctr Clin Res, Sci Res Dept, 117 Meishan Rd, Hefei 230031, Anhui, 12, Peoples R China |
| Meluna Res, Geldermalsen, Netherlands; [Wiegant, Frederik Anton Clemens] Univ Utrecht, Utrecht, Netherlands |
I wanted to extract the last comma-followed words Peoples R China, Netherlands from the text, so I used the negative lookahead to extract them.
(, )(?!.*\b\1\b)((\w*\s?){3})
But it seems like BigQuery doesn't support lookahead expressions since they only support RE2. Is there any way I can extract the country name without using lookahead expressions?
CodePudding user response:
You can use
,\s*([^,]*)$
See the regex demo. The pattern matches
,- a comma\s*- zero or more whitespaces([^,]*)- capturing group 1: any zero or more chars other than a comma$- end of string.
