I want to return the text to the left of department of, school of to return pomona college rutgers college
I am trying to get it to work where I can pass in multile options. ie split on "school of" or "department of" I have tried the | OR operator in the line before_words_or but I am doing something wrong.
regexp_extract(institute_name,r'^(.*)department of|school of',1,1)
If there is no "school of" or "department of" in the institute_name it should return the institute_name
with t1 as
(
select 'pomona college department of chemistry' institute_name,
union all select 'rutgers college school of engineering chemical engineering'
)
select
regexp_extract(institute_name,r'^(.*)department of',1,1) before_words1,
regexp_extract(institute_name,r'^(.*)school of',1,1) before_words2,
regexp_extract(institute_name,r'^(.*)department of|school of',1,1) before_words_or
from t1;
Actual result
| before_words1 | before_words2 | before_words_or |
|---|---|---|
| pomona college | pomona college | |
| rutgers college |
Desired result
| before_words1 | before_words2 | before_words_or |
|---|---|---|
| pomona college | pomona college department of chemistry' institute_name | pomona college |
| rutgers college school of engineering chemical engineering | rutgers college | rutgers college |
CodePudding user response:
You can use
r'^(.*?)\s*(?:department|school)\s of'
Or, with a word boundary to make sure of is matched as a whole word:
r'^(.*?)\s*(?:department|school)\s of\b'
See the regex demo.
Details:
^- start of string(.*?)- Capturing group: any zero or more chars other than line break chars as few as possible\s*- zero or more whitespaces(?:department|school)- a non-capturing group that matches eitherdepartmentor (|)school\s- one or more whitespacesof- anofword\b- a word boundary.
