I've been relatively successful using variations of the following to scrape web data.
$url = “https://msrc.microsoft.com/update-guide/en-US/vulnerability/CVE-2019-1331"
$response = Invoke-WebRequest -Uri $url
$response.ParsedHtml.body.getElementsByClassName('css-247') | select -expand innertext
I cannot get the data that is stored in a paragraph tag in the url listed in the code.
<div >
<p>
'Text I want to copy
I am assuming the P is not 'inner text'? How can I grab that Text?
If I copy XPath where the text is: it's at /html/body/div/div/div/div/div[2]/div/div[2]/div/div[2]/div/div[2]/div[3]/div[2]/div/div/div/div/div/div/p[1]
If I copy cssPath I get:
html body.ms-Fabric--isFocusHidden div#root div.ms-Fabric.root-41 div.css-43 div.ms-Stack.css-87 div.ms-Stack.css-87 div.ms-ScrollablePane.root-88 div.ms-ScrollablePane--contentContainer.contentContainer-89 div.ms-Stack.css-93 div.ms-Stack.css-110 div.ms-ScrollablePane.root-88 div.ms-ScrollablePane--contentContainer.contentContainer-89 div div div#executiveSummary.ms-Stack.ms-Card.css-136 div.ms-Stack.ms-CardSection.css-138 div.ms-Shimmer-container.root-113 div.ms-Shimmer-dataWrapper.dataWrapper-140 div.ms-StackItem.ms-CardItem.css-246 div.css-247 p
CodePudding user response:
Data is dynamically retrieved from an API call returning JSON i.e. when you navigate to the URI in your question, using a browser, the browser runs Javascript which leads to additional XHR requests being made and the page being updated with that content. With your current method those requests aren't made so the desired content is not present.
You need to call the appropriate endpoint, found in the network tab of the browser, extract the relevant part of the JSON response, parse out of the html the summary:
$url = "https://api.msrc.microsoft.com/sug/v2.0/en-US/vulnerability/CVE-2019-1331"
$response = Invoke-WebRequest -Uri $url
$data = $response | ConvertFrom-Json
$summary = $data.description
$html = New-Object -ComObject "HTMLFile"
$html.IHTMLDocument2_write($summary)
$html.firstChild | % innerText
I looked up writing an html string to IHTMLDocument2 here: https://paullimblog.wordpress.com/2017/08/08/ps-tip-parsing-html-from-a-local-file-or-a-string/
