Home > Mobile >  How to traverse and HTML AST in Javascript where the root node is actually an array
How to traverse and HTML AST in Javascript where the root node is actually an array

Time:01-26

I'm trying to find the correct way to traverse an HTML AST and find all the nodes with type: tag and push them into an array.

Also I'm using html-parse-stringify to convert my HTML into an AST if that helps with anything.

I've watched some videos on traversing HTML ASTs on youtube but they all start with one object as the main starting node, vs I'm starting with an Array. But doubt that is much of a problem.

The set of data I'm working with is a websites scraped data which is then converted into an AST using the previously mentioned library.

From here I just want to create a basic looping structure that can fully traverse my the AST while filtering out all the unnecessary types such as text & comment and then pushing the correct object into an array.

Here is the data structure that I'm working with, I've placed an empty data structure for ease of copying.

Also I would like to reduce the use of loops as much as possible for time complexity sake.

enter image description here

function mainLoop(node) {
  Array.prototype.forEach.call(node, parent => {
    console.log(parent.name);
    const children = parent.children.filter(n => n.type !== 'text' && n.type !== 'comment');
    loop(children)
  })
}

function loop(children) {
  console.log(children.name)
  if (children) {
    Array.prototype.forEach.call(children, child => {
      loop(child);
    })
  }
}

mainLoop();

Empty Data Structure

const docTree = [
  {
    attrs: {
      class: "flex flex-col h-screen",
    },
    children: [
      {
        type: 'tag',
        name: 'main',
        attrs: {
          class: ''
        },
        children: [],
      }
    ],
    name: 'div',
    type: 'tag',
    voidElement: false,
  }
]

CodePudding user response:

If your only goal is to remove text and comments, then it's pretty straightforward in a single reduce:

const traverse = (nodes) => {
  return nodes.reduce((acc,node) => {
     if(node.type === 'text' || node.type === 'comment') return acc;
     return [ ...acc, { ...node, children: traverse(node.children) } ]
  },[]);
}

I haven't actually run this code, but I think it'll work

If you want to flatten all the children then you do this:

const traverse = (nodes) => {
  return nodes.reduce((acc,{children = [], ...node}) => {
     if(node.type === 'text' || node.type === 'comment') return acc;
     return [ ...acc, node, ...traverse(children) ]
  },[]);
}

EDIT 2: Ah, I missed the part where you only want the type tag. That's done with this:

const traverse = (nodes) => {
  return nodes.reduce((acc,{children = [], ...node}) => {
     if(node.type !== 'tag') return acc;
     return [ ...acc, node, ...traverse(children) ]
  },[]);
}

Also, I'm not sure if you want the children to remain as part of the parent node or not. This here might also be what you want:

const traverse = (nodes) => {
  return nodes.reduce((acc,node) => {
     if(node.type !== 'tag') return acc;
     return [ ...acc, node, ...traverse(node.children) ]
  },[]);
}

CodePudding user response:

So I think I've found a solution that I'm looking for. Haven't fully finished testing it out but it's along the lines of this.

It uses a an outer loop to loop through my initial array of elements and then and inner recursive function to loop through all the child data that I was looking for and pushes it into an array.

function parentLoop(domAST) {
  let results = [];
  Array.prototype.forEach.call(domAST, ele => {
    function childLoop(node) {
      const cleaned = node.children.filter(n => n.type !== 'text' && n.type !== 'comment' && n.name !== 'br');
      for (let i = 0; i < cleaned.length; i  ) {
        let child = cleaned[i];
        if (child.type === 'tag') {
          results.push(child);
        }
        childLoop(child);
      }
    }
    childLoop(ele);
  })
  return results;
}

If there are better or cleaner solutions I'm still open to them.

  •  Tags:  
  • Related