I'm using a REST API to retrieve data from an Azure Table using the code below:
library(httr)
library(RCurl)
library(bitops)
library(xml2)
# Stores credentials in variable
Account <- "storageaccount"
Container <- "Usage"
Key <- "key"
# Composes URL
URL <- paste0(
"https://",
Account,
".table.core.windows.net",
"/",
Container
)
# Requests time stamp
requestdate <- format(Sys.time(), "%a, %d %b %Y %H:%M:%S %Z", tz = "GMT")
# As per Microsoft's specs, an empty line is needed for content-length
content_lenght <- 0
# Composes signature string
signature_string <- paste0(
"GET", "\n", # HTTP Verb
"\n", # Content-MD-5
"text/xml", "\n", # Content-Type
requestdate, "\n", # Date
"/", Account, "/", Container # Canonicalized resource
)
# Composes header string
header_string <- add_headers(
Authorization=paste0(
"SharedKey ",
Account,
":",
RCurl::base64(
digest::hmac(
key = RCurl::base64Decode(
Key, mode = "raw"
),
object = enc2utf8(signature_string),
algo = "sha256",
raw = TRUE
)
)
),
'x-ms-date' = requestdate,
'x-ms-version' = "2020-12-06",
'Content-type' = "text/xml"
)
# Creates request
xml_body = content(
GET(
URL,
config = header_string,
verbose()
),
"text"
)
Get_data <- xml_body # Gets data as text from API
From_JSON <-fromJSON(Get_data, flatten = TRUE) # Parses text from JSON
Table_name <- as.data.frame(From_JSON) # Saves data to a table
I can now view the table, but I noted that I can only see the first 1000 rows. What's the most efficient way to implement a loop/cycle that retrieves all the remaining rows and updates the table?
I need to be able to work on the entire dataset.
Also consider that this table will be updated with ~40,000 rows per day, so keeping the visuals current with the data is a concern.
Thanks in advance for your suggestions!
~Alienvolm
CodePudding user response:
Not sure how you would implement this in R specifically but here is the general approach:
When you list entities from a table, a maximum of 1000 entities are returned in a single request. If the table contains more than 1000 entities, Table Service will return two additional headers: x-ms-continuation-NextPartitionKey and x-ms-continuation-NextRowKey. Presence of these two headers indicate that there's more data available for you to fetch.
What you would need to do is use these headers and specify two query parameters in your next request URL: NextPartitionKey and NextRowKey. So your request would be something like:
https://account.table.core.windows.net/Table?NextPartitionKey=<x-ms-continuation-NextPartitionKey header value>&NextRowKey=<x-ms-continuation-NextRowKey header value>.
You would need to repeat the process till the time you do not get these headers in the response.
You can learn more about it here: 
Any help would be greatly appreciated!
