Home > Software design >  XPath separate key and value of group
XPath separate key and value of group

Time:01-14

I'm trying to fetch data from a dd group that is not really well structured. The 'group' does have a DD wrapper but inside it's only p/div without a grouped wrapper around it:

[DD]
 [P] Key
 [DIV]
    [P] Value
 [P] Key
 [DIV]
    [P] Value

Is it possible to collect the data the proper way?

The html code I'm processing:

<dd >
  <p >
    EAN/UPC - product
  </p>
  <div >
    <p >
      7912372 
    </p>
  </div>
  <p >
    Weight
  </p>
  <div >
    <p >
      2,170
      <span>kg</span>
    </p>
  </div>
</dd>

I currently get the following result as a array:

{
"key": [
        "EAN\/UPC - product",
        "Weight"
        ],
"value": [
        "7912372",
        "2,170 kg",
        ]
}

And I need to get (without arrays):

{
    "key": "EAN/UPC - product",
    "value": "7912372"
},
{
    "key": "Weight",
    "value": "2,170 kg"
}

I'm fetching the data via an API with the following request:

{
        "name":"attributes",
        "selector":"div.product-specifications-v2__wrapper dl dd",
        "targets":[
           {
              "name":"key",
              "selector":"p.product-specifications-v2__key",
              "dataType":"title"
           },
           {
              "name":"value",
              "selector":"div.product-specifications-v2__value p.product-specifications-v2__value-item",
              "dataType":"title"
           }
        ]
     }

CodePudding user response:

Using XPath 3.1 (for instance, inside the browser with Saxon-JS (https://www.saxonica.com/saxon-js/documentation2/index.html), also with Node) you can use a path expression that creates an XPath 3.1 XDM map with the key and value:

//dd[@class = 'product-specifications-v2__items']/p[@class = 'product-specifications-v2__key']!map { 'key' : normalize-space(), 'value' : following-sibling::div[@class = 'product-specifications-v2__value'][1]!normalize-space() }

const html = `<dd >
  <p >
    EAN/UPC - product
  </p>
  <div >
    <p >
      7912372 
    </p>
  </div>
  <p >
    Weight
  </p>
  <div >
    <p >
      2,170
      <span>kg</span>
    </p>
  </div>
</dd>`;

var htmlDoc = new DOMParser().parseFromString(html, 'text/html');

const results = SaxonJS.XPath.evaluate(`//dd[@class = 'product-specifications-v2__items']/p[@class = 'product-specifications-v2__key']!map { 'key' : normalize-space(), 'value' : following-sibling::div[@class = 'product-specifications-v2__value'][1]!normalize-space() }`, htmlDoc, { 'xpathDefaultNamespace' : 'http://www.w3.org/1999/xhtml' });

console.log(results);
<script src="https://www.saxonica.com/saxon-js/documentation2/SaxonJS/SaxonJS2.rt.js"></script>

The JavaScript API of Saxon-JS returns the sequence of XDM maps as an array of JSON objects to JavaScript.

  •  Tags:  
  • Related