I'm trying to generate 3 new calculated fields in Google Data Studio / Looker Studio using the REGEXP_EXTRACT function.
Here is the sample data that I have on a Google sheet:
| Sample data |
|---|
| Serviços de Impressão > Impressoras > Falha na impressão > Troca de Tonner |
| Aplicativos e Softwares > Avaliação de Aplicativos e Software > Pacote Office |
| Computadores e Periféricos > Manutenção > Teclado / Mouse > Aquisição de equipamento |
| Acessos > Certificado Digital |
Each > represents a division and ideally what I would like to do is extract the first three fields and disregard the rest, something like this:
| Calculated field 1 | Calculated field 2 | Calculated field 3 |
|---|---|---|
| Serviços de Impressão | Impressoras | Falha na impressão |
| Aplicativos e Softwares | Avaliação de Aplicativos e Software | Pacote Office |
| Computadores e Periféricos | Manutenção | Teclado / Mouse |
| Acessos | Certificado Digital | null |
I managed to generate a code to extract the first calculated field using
REGEXP_EXTRACT(Sample data,'^(. ?)>')
but in the second I didn't know how to do it, since I can always have one or more separators > as in the example of the last line.
How can I formulate the codes for calculated fields 2 and 3 please?
CodePudding user response:
The second field can be extracted with
> (.*?)(?: > |$)
The third field - if any - can be obtained with
> .*? > (.*?)(?: > |$)
The (?: > |$) non-capturing group matches either space > space, or end of string.
CodePudding user response:
you can try:
REGEXP_EXTRACT(test1,'^(?:[^>] >){0}([^>] )')
the above formula is for the first field. change the Zero in parentheses to 1 for second field and 2 for third field.

