Home > Back-end >  Parsing string with multiple delimiters into columns
Parsing string with multiple delimiters into columns

Time:01-25

I want to split strings into columns.

My columns should be:

account_id, resource_type, resource_name

I have a JSON file source that I have been trying to parse via ADF data flow. That hasn't worked for me, hence I flattened the data and brought it into SQL Server (I am open to parsing values via ADF or SQL if anyone can show me how). Please check the JSON file at the bottom.

Use this code to query the data I am working with.

 CREATE TABLE test.test2
 (
     resource_type nvarchar(max) NULL
 )

 INSERT INTO test.test2 ([resource_type]) 
 VALUES 
     ('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
     ('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
     ('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
     ('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')

The output that I should be able to query in SQL Server should like this:

account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups

and so forth.

Please help me out and ask for clarification if needed. Thanks in advance.

EDIT:

Source JSON Format:

{
    "start_date": "2021-12-01 00:00:00 00:00",
    "end_date": "2021-12-31 23:59:59 00:00",
    "resource_type": "all",
    "records": [
        {
            "directconnect_connections": [
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
            ]
        },
        {
            "virtual_interfaces": [
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
                "account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
                "account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
            ]
        }
    ]
}

CodePudding user response:

Since you don't have a valid JSON string and not wanting to get in the business of string manipulation... perhaps this will help.

Select B.*
 From  test2 A
 Cross Apply ( Select account_id    = max(case when value like 'account_id:%'    then stuff(value,1,11,'') end )
                     ,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
                     ,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
                from  string_split(resource_type,',') 
             )B

Results

account_id      resource_type   resource_name
224526257458    buckets         camp-stage-artifactory
535533456241    buckets         tni-prod-diva-backups
369798452057    buckets         369798452057-s3-manifests
460085747812    buckets         vessel-incident-report-nonprod-accesslogs

CodePudding user response:

Unfortunately, the values inside the arrays are not valid JSON. You can patch them up by adding {} to the beginning/end, and adding " on either side of : and ,.

DECLARE @json nvarchar(max) = N'{
    "start_date": "2021-12-01 00:00:00 00:00",
    "end_date": "2021-12-31 23:59:59 00:00",
    "resource_type": "all",
    "records": [
        {
            "directconnect_connections": [
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
            ]
        },
        {
            "virtual_interfaces": [
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
                "account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
                "account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
            ]
        }
    ]
}';


SELECT
  j4.account_id,
  j4.resource_type,
  TRIM('''' FROM j4.resource_name) resource_name
FROM OPENJSON(@json, '$.records') j1
CROSS APPLY OPENJSON(j1.value) j2
CROSS APPLY OPENJSON(j2.value) j3
CROSS APPLY OPENJSON('{"'   REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","')   '"}')
  WITH (
    account_id bigint,
    resource_type varchar(20),
    resource_name varchar(100)
  ) j4;

db<>fiddle

The first three calls to OPENJSON have no schema, so the resultset is three columns: key value and type. In the case of arrays (j1 and j3), key is the index into the array. In the case of single objects (j2), key is each property name.

  •  Tags:  
  • Related