I'm trying to get a number from a static page in a website, but when I do the HTML request, the result is a strange HTML without the informations of the original html that I want.
The website that I'm trying to get the information is: https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=233361&cvm=true but I can get the same result with: https://fnet.bmfbovespa.com.br/fnet/publico/visualizarDocumento?id=233361&cvm=true
The number that I want to get is the number "0,05" in the page
My code is:
Sub trying()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=233361&cvm=true&", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT" 'to deal with potential caching
.send
html.body.innerHTML = .responseText
End With
Set element = html.getElementsByTagName("td")(31).innerText
Sheets("Sheet1").Cells(1, 1) = element
End Sub
I have also tried to do that using the InternetExplorer.Application but the the problem keeps the same
CodePudding user response:
After trying a bunch of request headers, Accept request header is required to return the response in HTML:
Sub trying()
Dim html As HTMLDocument
Set html = New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento?id=233361&cvm=true&", False
.setRequestHeader "Accept", "text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9"
.send
html.body.innerHTML = .responseText
End With
Dim element As String
element = html.getElementsByTagName("td")(32).innerText
Sheets("Sheet1").Cells(1, 1) = element
End Sub
CodePudding user response:
Just add a .htm (or .html) extension to the request to specify file type wanted.
Option Explicit
Public Sub trying()
Dim html As MSHTML.HTMLDocument
Set html = New MSHTML.HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://fnet.bmfbovespa.com.br/fnet/publico/exibirDocumento.htm?id=233361", False
.send
html.body.innerHTML = .responseText
End With
Debug.Print html.querySelector("tr:nth-child(6) .dado-valores").innerText
End Sub
Alternatives, that are a little more effort, include adding the accept header or base64 decoding the .responseText before writing to .innerHTML.
If you went down the less preferable base64 decode route, using function from here (note: pretty sure this is not the original source), then you will need to amend the following line:
.DataType = "bin.base64": .Text = Replace$(b64, Chr$(34), vbNullString) 'modified line
Public Function DecodeBase64(b64$)
Dim b
With CreateObject("Microsoft.XMLDOM").createElement("b64")
.DataType = "bin.base64": .Text = Replace$(b64, Chr$(34), vbNullString) 'modified line
b = .nodeTypedValue
With CreateObject("ADODB.Stream")
.Open: .Type = 1: .Write b: .Position = 0: .Type = 2: .Charset = "utf-8"
DecodeBase64 = .ReadText
.Close
End With
End With
End Function
