Could you help me to write the function correctly. First, I'll show you an example:
df1 <- structure(
list(
X1 = c(1, 1, 1, 1),
X2 = c("4","3","1","2"),
X3 = c("1", "2","3","2"),
X4 = c("1", "2","3","2"),
XM1 = c(200, 300, 200, 200),
XMR0 = c(300, 300, 300, 300),
XMR01 = c(300, 300, 300, 300),
XMR02 = c(300,300,300,300),
XMR03 = c(300,300,300,300),
XMR04 = c(300,250,350,350)),row.names = c(NA, 4L), class = "data.frame")
f1 <- function(data){
data %>%
transmute(across(matches("^X\\d $")),
XM1, across(starts_with("XMR"), ~ XM1 - .x,
.names = "{.col}_PV" ))
}
f1(df1)
> f1(df1)
X1 X2 X3 X4 XM1 XMR0_PV XMR01_PV XMR02_PV XMR03_PV XMR04_PV
1 1 4 1 1 200 -100 -100 -100 -100 -100
2 1 3 2 2 300 0 0 0 0 50
3 1 1 3 3 200 -100 -100 -100 -100 -150
4 1 2 2 2 200 -100 -100 -100 -100 -150
Now I have a similar database, but the column names are different.
df1 <- structure(
list(
Id = c(1, 1, 1, 1),
date1 = c("2022-01-06","2022-01-06","2022-01-06","2022-01-06"),
date2 = c("2022-01-02","2022-01-03","2022-01-09","2022-01-10"),
Week = c("Sunday","Monday","Sunday","Monday"),
Category = c("EFG", "ABC","EFG","ABC"),
DR1 = c(200, 300, 200, 200),
DRM0 = c(300, 300, 300, 300),
DRM01 = c(300, 300, 300, 300),
DRM02 = c(300,300,300,300),
DRM03 = c(300,300,300,300),
DRM04 = c(300,250,350,350)),row.names = c(NA, 4L), class = "data.frame")
So I would like to create a function that can be called f2. What would my function look like now, compared to f1 above?
Output expected
Id date2 Week Category DR1 DRM0_PV DRM01_PV DRM02_PV DRM03_PV DRM04_PV
1 1 2022-01-02 Sunday EFG 200 -100 -100 -100 -100 -100
2 1 2022-01-03 Monday ABC 300 0 0 0 0 50
3 1 2022-01-09 Sunday EFG 200 -100 -100 -100 -100 -150
4 1 2022-01-10 Monday ABC 200 -100 -100 -100 -100 -150
CodePudding user response:
We may add some additional arguments in the function as input
colnm- column name that is used to subtract as string (ensymconverts to symbol and it is evaluated with!!- by usingensym, we can also use unquoted argument as input)pat- prefix pattern of the column name to be used for loopingacrossthose columnscols_del- columns to be deleted. By default it isNULL. Thus, if we don't have the fourth argument, none of the columns are deleted.
f1 <- function(data, colnm, pat, cols_del = NULL){
colnm <- rlang::ensym(colnm)
data %>%
mutate(!! colnm, across(starts_with(pat), ~ !! colnm - .x,
.names = "{.col}_PV" ), .keep = "unused") %>%
select(-any_of(cols_del))
}
The code loops across those columns that have prefix 'DRM/XMR' and subtract the value of column input in colnm, and return only those columns unused i.e. as we are creating new columns with .names, the looped columns are not returned in the data, but we need 'DR1' or 'XM1', thus it is selected (!! colnm), and in the last step remove any_of 'cols_del'ed from the output
-testing
> f1(df1, "DR1", "DRM", "date1")
Id date2 Week Category DR1 DRM0_PV DRM01_PV DRM02_PV DRM03_PV DRM04_PV
1 1 2022-01-02 Sunday EFG 200 -100 -100 -100 -100 -100
2 1 2022-01-03 Monday ABC 300 0 0 0 0 50
3 1 2022-01-09 Sunday EFG 200 -100 -100 -100 -100 -150
4 1 2022-01-10 Monday ABC 200 -100 -100 -100 -100 -150
-using the original 'df1'
> f1(df1, "XM1", "XMR")
X1 X2 X3 X4 XM1 XMR0_PV XMR01_PV XMR02_PV XMR03_PV XMR04_PV
1 1 4 1 1 200 -100 -100 -100 -100 -100
2 1 3 2 2 300 0 0 0 0 50
3 1 1 3 3 200 -100 -100 -100 -100 -150
4 1 2 2 2 200 -100 -100 -100 -100 -150
