I'm trying to fetch data from a dd group that is not really well structured. The 'group' does have a DD wrapper but inside it's only p/div without a grouped wrapper around it:
[DD]
[P] Key
[DIV]
[P] Value
[P] Key
[DIV]
[P] Value
Is it possible to collect the data the proper way?
The html code I'm processing:
<dd >
<p >
EAN/UPC - product
</p>
<div >
<p >
7912372
</p>
</div>
<p >
Weight
</p>
<div >
<p >
2,170
<span>kg</span>
</p>
</div>
</dd>
I currently get the following result as a array:
{
"key": [
"EAN\/UPC - product",
"Weight"
],
"value": [
"7912372",
"2,170 kg",
]
}
And I need to get (without arrays):
{
"key": "EAN/UPC - product",
"value": "7912372"
},
{
"key": "Weight",
"value": "2,170 kg"
}
I'm fetching the data via an API with the following request:
{
"name":"attributes",
"selector":"div.product-specifications-v2__wrapper dl dd",
"targets":[
{
"name":"key",
"selector":"p.product-specifications-v2__key",
"dataType":"title"
},
{
"name":"value",
"selector":"div.product-specifications-v2__value p.product-specifications-v2__value-item",
"dataType":"title"
}
]
}
CodePudding user response:
Using XPath 3.1 (for instance, inside the browser with Saxon-JS (https://www.saxonica.com/saxon-js/documentation2/index.html), also with Node) you can use a path expression that creates an XPath 3.1 XDM map with the key and value:
//dd[@class = 'product-specifications-v2__items']/p[@class = 'product-specifications-v2__key']!map { 'key' : normalize-space(), 'value' : following-sibling::div[@class = 'product-specifications-v2__value'][1]!normalize-space() }
const html = `<dd >
<p >
EAN/UPC - product
</p>
<div >
<p >
7912372
</p>
</div>
<p >
Weight
</p>
<div >
<p >
2,170
<span>kg</span>
</p>
</div>
</dd>`;
var htmlDoc = new DOMParser().parseFromString(html, 'text/html');
const results = SaxonJS.XPath.evaluate(`//dd[@class = 'product-specifications-v2__items']/p[@class = 'product-specifications-v2__key']!map { 'key' : normalize-space(), 'value' : following-sibling::div[@class = 'product-specifications-v2__value'][1]!normalize-space() }`, htmlDoc, { 'xpathDefaultNamespace' : 'http://www.w3.org/1999/xhtml' });
console.log(results);
<script src="https://www.saxonica.com/saxon-js/documentation2/SaxonJS/SaxonJS2.rt.js"></script>
The JavaScript API of Saxon-JS returns the sequence of XDM maps as an array of JSON objects to JavaScript.
