This is a example from RStudio about "render SQL code", but I runed it and show error. Anyone can help? Thanks!
library(nycflights13)
ranked <- flights %>%
group_by(year, month, day) %>%
select(dep_delay) %>%
mutate(rank = rank(desc(dep_delay)))
dbplyr::sql_render(ranked)
Here is the error message
Error in UseMethod("sql_render") :
no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"
CodePudding user response:
When you are working on a "normal" data.frame, then it returns a frame, in which case sql_render is inappropriate (and will be very confused). If we work with just your initial code, then we can see that SQL has nothing to do with it:
library(dplyr)
library(nycflights)
ranked <- flights %>%
group_by(year, month, day) %>%
select(dep_delay) %>%
mutate(rank = rank(desc(dep_delay)))
ranked
# # A tibble: 336,776 x 5
# # Groups: year, month, day [365]
# year month day dep_delay rank
# <int> <int> <int> <dbl> <dbl>
# 1 2013 1 1 2 313
# 2 2013 1 1 4 276
# 3 2013 1 1 2 313
# 4 2013 1 1 -1 440
# 5 2013 1 1 -6 742
# 6 2013 1 1 -4 633
# 7 2013 1 1 -5 691
# 8 2013 1 1 -3 570
# 9 2013 1 1 -3 570
# 10 2013 1 1 -2 502.
# # ... with 336,766 more rows
But dbplyr won't be able to do something with that:
library(dbplyr)
sql_render(ranked)
# Error in UseMethod("sql_render") :
# no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"
If, however, we have that same flights data in a database, then we can do what you are expecting, with some minor changes.
# pgcon <- DBI::dbConnect(odbc::odbc(), ...) # to my local postgres instance
copy_to(pgcon, flights, name = "flights_table") # go get some coffee
flights_db <- tbl(pgcon, "flights_table")
ranked_db <- flights_db %>%
group_by(year, month, day) %>%
select(dep_delay) %>%
mutate(rank = rank(desc(dep_delay)))
# Adding missing grouping variables: `year`, `month`, `day`
We can see some initial data, showing the top 10 rows of what the query will eventually return:
ranked_db
# # Source: lazy query [?? x 5]
# # Database: postgres [postgres@localhost:/]
# # Groups: year, month, day
# year month day dep_delay rank
# <int> <int> <int> <dbl> <int64>
# 1 2013 1 1 NA 1
# 2 2013 1 1 NA 1
# 3 2013 1 1 NA 1
# 4 2013 1 1 NA 1
# 5 2013 1 1 853 5
# 6 2013 1 1 379 6
# 7 2013 1 1 290 7
# 8 2013 1 1 285 8
# 9 2013 1 1 260 9
# 10 2013 1 1 255 10
# # ... with more rows
and we can see what the real SQL query looks like:
sql_render(ranked_db)
# <SQL> SELECT "year", "month", "day", "dep_delay", RANK() OVER (PARTITION BY "year", "month", "day" ORDER BY "dep_delay" DESC) AS "rank"
# FROM "flights_table"
Realizing that, due to the way dbplyr operates, we don't know how many rows will be returned until we collect it:
nrow(ranked_db)
# [1] NA
res <- collect(ranked_db)
nrow(res)
# [1] 336776
res
# # A tibble: 336,776 x 5 # <--- no longer 'Source: lazy query [?? x 5]'
# # Groups: year, month, day [365]
# year month day dep_delay rank
# <int> <int> <int> <dbl> <int64>
# 1 2013 1 1 NA 1
# 2 2013 1 1 NA 1
# 3 2013 1 1 NA 1
# 4 2013 1 1 NA 1
# 5 2013 1 1 853 5
# 6 2013 1 1 379 6
# 7 2013 1 1 290 7
# 8 2013 1 1 285 8
# 9 2013 1 1 260 9
# 10 2013 1 1 255 10
# # ... with 336,766 more rows
CodePudding user response:
Check the documentation of the package. So you can render a code with the SQL syntax.
Maybe the chunk of code below helps you:
library(dplyr)
library(SqlRender)
library(nycflights13)
ranked <- flights %>%
group_by(year, month, day) %>%
select(dep_delay) %>%
mutate(rank = rank(desc(dep_delay))) %>%
ungroup()
sql <- "SELECT * FROM @x WHERE month = @a;"
render(sql, x = ranked, a = 2)
