Home > Mobile >  Scraping entire lines with HtmlUnit
Scraping entire lines with HtmlUnit

Time:01-07

I'm struggling to scrape the entire option lines from a web page

<select id="Code_9" name="value[2].valueType" onchange="changeMe(this);">
  <option value="0">Identifier_1</option>
  <option value="1">Identifier_2</option>
  <option value="2">Identifier_3</option>
  <option value="3" selected="">Identifier_4</option>
</select>

When running this code:

List <HtmlDivision> selectedValue = htmlPage.getByXPath("//*[@id='Code_9']/option");
        for (int i = 0; i < selectedValue.size(); i  )
        {
            System.out.println(selectedValue.get(i));
        }

It returns this:

HtmlOption[<option value="0">]
HtmlOption[<option value="1">]
HtmlOption[<option value="2">]
HtmlOption[<option value="3" selected="">]

But i also need the "identifier". Or i could get a direct copy of everything within "select" and do some string parsing.

Note that the processing time of scraping this should be as low as possible.

Edit (07.01.22): HTMLDivision should be HTMLElement instead. This way @RBRi answer is correct and using selectedValue.get(i).asXml() will output:

<option value="0">
  Identifier_1
</option>

<option value="1">
  Identifier_2
</option>

<option value="2">
  Identifier_3
</option>

<option value="3" selected="">
  Identifier_4
</option>

CodePudding user response:

You can use

selectedValue.get(i).asXml()

if you like to have it as string.

Otherwise use getChildNodes() or getChildren().

  •  Tags:  
  • Related