Home > OS >  How To Extract Uppercase First Letter Multiple Words Per Cells Only Ideally Ignoring First Words of
How To Extract Uppercase First Letter Multiple Words Per Cells Only Ideally Ignoring First Words of

Time:01-06

I'm trying to extract all words with Uppercase initial letter from a text, with the REGEXEXTRACT formula in google sheets.

Ideally the first word of sentences should be ignored and only all subsequent words with first Uppercase letter should be extracted.

Other Close Questions and Formulas:

I've found those other two questions and answers:

Accepted answer From: How to extract multiple names with capital
letters in Google Sheets?)

Sample Sheet:

Here's my testing enter image description here

See the enter image description here

Formula in B1:

=INDEX(IF(A1:A<>"",SPLIT(REGEXREPLACE(A1:A,"(?:(?:^|[.?!] )\s*\S |\b([A-ZÖ][a-zö] (?:-[A-ZÖ][a-zö] )*)\b|. ?)","$1|"),"|",1),""))

The pattern: (?:(?:^|[.?!] )\s*\S |\b([A-ZÖ][a-zö] (?:-[A-ZÖ][a-zö] )*)\b|. ?) means:

  • (?: - Open non-capture group to allow for alternations:
    • (?:^|[.?!] )\s*\S - A nested non-capture group to allow for the start-line anchor or 1 literal dots or question/exclamation marks, followed by 0 whitespace chars and 1 non-whitespace chars;
    • | - Or;
    • \b([A-ZÖ][a-zö] (?:-[A-ZÖ][a-zö] )*)\b - A 1st capture-group to catch camel-case strings (with optional hyphen) between word-boundaries;
    • | - Or;
    • . ? - Any 1 characters (Lazy);
    • ) - Close non-capture group.

The idea is here to use REGEXREPLACE() to substitute any match with the backreference to the 1st capture group and a pipe-symbol (or any symbol for that matter that won't be in your input) and use SPLIT() to get all words seperated. Note that it is important to use the 3rd parameter of the function to ignore empty strings.

INDEX() will trigger the array-functionality and spill the results. I used an nested IF() statement to check for empty cells to skip.

  •  Tags:  
  • Related