I am trying to extract a person's name between different characters. For example, the cells contains this information
PATIENT: 2029985 - COLLINS, JUNIOR .
PATIENT: 1235231-02 - JERRY JR, PATRICK .
PATIENT: 986435--EXP-- - JULIUS, DANIEL .
PATIENT: 2021118-02 - DRED-HARRY, KEVIN .
My goal is to use one REGEXTRACT formula to get the following:
COLLINS, JUNIOR
JERRY JR, PATRICK
JULIUS, DANIEL
LOVE ALSTON, BRENDA
So far, I have come up with the formula:
=ARRAYFORMULA(REGEXEXTRACT(B3:B, "-(.*)\."))
Where B3 contains the first information
Using that formula, I get:
COLLINS, JUNIOR
02 - JERRY JR, PATRICK
02 - LOVE-ALSTON, BRENDA
-EXP-- - JULIUS, DANIEL
02 - DRED-HARRY, KEVIN
I managed to get the first name down but how do I go about extracting the rest.
CodePudding user response:
1st solution: With your shown samples, please try following regex.
^PATIENT:.*-\s ([^.]*?)\s*\.
OR try following Google-sheet forumla:
=ARRAYFORMULA(REGEXEXTRACT(B3:B, "^PATIENT:.*-\s ([^.]*?)\s*\."))
Explanation: Checking if line/value starts from PATIENT followed by : till -(using greedy mechanism here), which is followed by spaces(1 or more occurrences). Then creating one and only capturing group which contains everything just before .(dot) in it making it non-greedy, closing capturing group which is followed by spaces(0 or more occurrences) followed by a literal dot.
2nd solution: Using lazy match approach in regex, please try following regex.
.*?\s-\s([^.]*?)\s*\.
Google-sheet formula will be as follows:
=ARRAYFORMULA(REGEXEXTRACT(B3:B, ".*?\s-\s([^.]*?)\s*\."))
CodePudding user response:
You can use
=ARRAYFORMULA(REGEXEXTRACT(B3:B, "\s-\s ([^.]*?)\s*\."))
See the regex demo. Details:
\s-\s- a whitespace,-, one or more whitespaces([^.]*?)- Group 1: zero or more chars other than a.as few as possible\s*- zero or more whitespaces\.- a.char.
