I am trying to read this url into R as a JSON: https://comtrade.un.org/Data/cache/reporterAreas.json
I see that there is additional content at the top of the file, wrapping the content I am after. A sample of the file looks as follows:
{
"more": false,
"results": [
{
"id": "all",
"text": "All"
},
{
"id": "4",
"text": "Afghanistan"
},
{
"id": "8",
"text": "Albania"
}
]
}
Trying to read using:
x <- GET(url)
fromJSON(rawToChar(x$content))
doesn't work throwing error: unexpected character '<ef>'. I assume this is seeing the [.
I also tried download.file(url, file), calling fromJSON(file), but that threw the error unexpected character 'r', which I am guessing is from "results"
I assume this is just some header formatting for the JSON (apologies, I don't do much with JSON files), and there is am option for dealing with it either via GET() or fromJSON(), but I can't see anything in the docs. None of the examples that i have seen describing how to pull JSON from url have this format.
When I call class(rawToChar(x$content)) it shows as a chr vector, so I could clean that eliminating the {"more": false,"results": [ and ]}, but that seems wonky for what looks like a standard format.
If someone can show me how to import this correctly, i would welcome it. Also welcome a more useful question title which describes this issue more effectively.
CodePudding user response:
The <ef> character is the first byte of a byte-order mark translated to UTF-8. The other bytes are <bb><bf>.
When I download the file using download.file() and then decode it using jsonlite::read_json(), it gives a warning about the BOM, but appears to read the rest of the file without an error. You should try that.
