Given the table of items below and a list of predefined patterns (modern, rustic, contemporary, classic, vintage) how can I create another table which contains regex matches organized by source for each item (source_1, source_2 etc),
the struct for each match is in key value format, i.e. <pattern STRING , source ARRAY<STRING>> , and each row would contain an array of these structs, i.e. ARRAY <<pattern STRING , source ARRAY <STRING>>>
items table :
with items_for_sale AS (
select 1 as item_id, 'modern chair' as source_1, ['contemporary chair', 'modernist chair'] as source_2,
union all
select 2 as item_id, 'classic lamp' as source_1, ['modern vintage lamp', 'blah'] as source_2,
union all
select 3 as item_id, 'rustic bed' as source_1, ['cottage bed', 'vintage country bed'] as source_2,
)
select* from items_for_sale
List of predefined patterns to search for e.g.
modern, rustic, contemporary, classic, vintage (the actual list has ~1000 items), the regex is expected to find if the pattern is contained in the string
Expected output table with regex matches by source for each item:
This is pretty straightforward to do with python or any other language by creating a key-value dictionary for each item_id, but is it possible to do it in BQ SQL
CodePudding user response:
Consider below simple approach
with patterns as (
select pattern
from unnest(['modern', 'rustic', 'contemporary', 'classic', 'vintage']) pattern
)
select item_id,
array_agg(struct(pattern, source) order by pattern, source) regexp_matches_by_source
from (
select item_id, source_1 as value, 'source_1' as source from items_for_sale union all
select item_id, source_2, 'source_2' from items_for_sale t, t.source_2 as source_2
)
join patterns
on regexp_contains(value, pattern)
group by item_id
If applied to sample data in your question - output is



