Qustiong about using 'dbplyr' to render SQL code-CodePudding

This is a example from RStudio about "render SQL code", but I runed it and show error. Anyone can help? Thanks!

library(nycflights13)
ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))

dbplyr::sql_render(ranked)

Here is the error message

   Error in UseMethod("sql_render") : 
      no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"

CodePudding user response：

When you are working on a "normal" data.frame, then it returns a frame, in which case sql_render is inappropriate (and will be very confused). If we work with just your initial code, then we can see that SQL has nothing to do with it:

library(dplyr)
library(nycflights)
ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))
ranked
# # A tibble: 336,776 x 5
# # Groups:   year, month, day [365]
#     year month   day dep_delay  rank
#    <int> <int> <int>     <dbl> <dbl>
#  1  2013     1     1         2  313 
#  2  2013     1     1         4  276 
#  3  2013     1     1         2  313 
#  4  2013     1     1        -1  440 
#  5  2013     1     1        -6  742 
#  6  2013     1     1        -4  633 
#  7  2013     1     1        -5  691 
#  8  2013     1     1        -3  570 
#  9  2013     1     1        -3  570 
# 10  2013     1     1        -2  502.
# # ... with 336,766 more rows

But dbplyr won't be able to do something with that:

library(dbplyr)
sql_render(ranked)
# Error in UseMethod("sql_render") : 
#   no applicable method for 'sql_render' applied to an object of class "c('grouped_df', 'tbl_df', 'tbl', 'data.frame')"

If, however, we have that same flights data in a database, then we can do what you are expecting, with some minor changes.

# pgcon <- DBI::dbConnect(odbc::odbc(), ...)     # to my local postgres instance
copy_to(pgcon, flights, name = "flights_table")  # go get some coffee

flights_db <- tbl(pgcon, "flights_table")
ranked_db <- flights_db %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay)))
# Adding missing grouping variables: `year`, `month`, `day`

We can see some initial data, showing the top 10 rows of what the query will eventually return:

ranked_db
# # Source:   lazy query [?? x 5]
# # Database: postgres [postgres@localhost:/]
# # Groups:   year, month, day
#     year month   day dep_delay    rank
#    <int> <int> <int>     <dbl> <int64>
#  1  2013     1     1        NA       1
#  2  2013     1     1        NA       1
#  3  2013     1     1        NA       1
#  4  2013     1     1        NA       1
#  5  2013     1     1       853       5
#  6  2013     1     1       379       6
#  7  2013     1     1       290       7
#  8  2013     1     1       285       8
#  9  2013     1     1       260       9
# 10  2013     1     1       255      10
# # ... with more rows

and we can see what the real SQL query looks like:

sql_render(ranked_db)
# <SQL> SELECT "year", "month", "day", "dep_delay", RANK() OVER (PARTITION BY "year", "month", "day" ORDER BY "dep_delay" DESC) AS "rank"
# FROM "flights_table"

Realizing that, due to the way dbplyr operates, we don't know how many rows will be returned until we collect it:

nrow(ranked_db)
# [1] NA

res <- collect(ranked_db)
nrow(res)
# [1] 336776

res
# # A tibble: 336,776 x 5                 # <--- no longer 'Source:   lazy query [?? x 5]'
# # Groups:   year, month, day [365]
#     year month   day dep_delay    rank
#    <int> <int> <int>     <dbl> <int64>
#  1  2013     1     1        NA       1
#  2  2013     1     1        NA       1
#  3  2013     1     1        NA       1
#  4  2013     1     1        NA       1
#  5  2013     1     1       853       5
#  6  2013     1     1       379       6
#  7  2013     1     1       290       7
#  8  2013     1     1       285       8
#  9  2013     1     1       260       9
# 10  2013     1     1       255      10
# # ... with 336,766 more rows

CodePudding user response：

Check the documentation of the package. So you can render a code with the SQL syntax.

Maybe the chunk of code below helps you:

library(dplyr)
library(SqlRender)
library(nycflights13)

ranked <- flights %>%
  group_by(year, month, day) %>%
  select(dep_delay) %>% 
  mutate(rank = rank(desc(dep_delay))) %>%
  ungroup()

sql <- "SELECT * FROM @x WHERE month = @a;"
render(sql, x = ranked, a = 2)