Home > Back-end >  apply function or loop within mutate
apply function or loop within mutate

Time:01-13

Let's say I have a data frame. I would like to mutate new columns by subtracting each pair of the existing columns. There are rules in the matching columns. For example, in the below codes, the prefix is all same for the first component (base_g00) of the subtraction and the same for the second component (allow_m00). Also, the first component has numbers from 27 to 43 for the id and the second component's id is from 20 to 36 also can be interpreted as (1st_id-7). I am wondering for the following code, can I write in a apply function or loops within mutate format to make the codes simpler. Thanks so much for any suggestions in advance!

pred_error<-y07_13%>%mutate(annual_util_1=base_g0027-allow_m0020,
     annual_util_2=base_g0028-allow_m0021,
     annual_util_3=base_g0029-allow_m0022,
     annual_util_4=base_g0030-allow_m0023,
     annual_util_5=base_g0031-allow_m0024,
     annual_util_6=base_g0032-allow_m0025,
     annual_util_7=base_g0033-allow_m0026,
     annual_util_8=base_g0034-allow_m0027,
     annual_util_9=base_g0035-allow_m0028,
     annual_util_10=base_g0036-allow_m0029,
     annual_util_11=base_g0037-allow_m0030,
     annual_util_12=base_g0038-allow_m0031,
     annual_util_13=base_g0039-allow_m0032,
     annual_util_14=base_g0040-allow_m0033,
     annual_util_15=base_g0041-allow_m0034,
     annual_util_16=base_g0042-allow_m0035,
     annual_util_17=base_g0043-allow_m0036)

CodePudding user response:

I think a more idiomatic tidyverse approach would be to reshape your data so those column groups are encoded as a variable instead of as separate columns which have the same semantic meaning.

For instance,

library(dplyr); library(tidyr); library(stringr)       
y07_13 <- tibble(allow_m0021 = 1:5,
                 allow_m0022 = 2:6,
                 allow_m0023 = 11:15,
                 base_g0028 = 5,
                 base_g0029 = 3:7,
                 base_g0030 = 100)

y07_13 %>%
  mutate(row = row_number()) %>%
  pivot_longer(-row) %>%
  mutate(type = str_extract(name, "allow_m|base_g"),
         num = str_remove(name, type) %>% as.numeric(),
         group = num - if_else(type == "allow_m", 20, 27)) %>%
  select(row, type, group, value) %>%
  pivot_wider(names_from = type, values_from = value) %>%
  mutate(annual_util = base_g - allow_m)

Result

# A tibble: 15 x 5
     row group allow_m base_g annual_util
   <int> <dbl>   <dbl>  <dbl>       <dbl>
 1     1     1       1      5           4
 2     1     2       2      3           1
 3     1     3      11    100          89
 4     2     1       2      5           3
 5     2     2       3      4           1
 6     2     3      12    100          88
 7     3     1       3      5           2
 8     3     2       4      5           1
 9     3     3      13    100          87
10     4     1       4      5           1
11     4     2       5      6           1
12     4     3      14    100          86
13     5     1       5      5           0
14     5     2       6      7           1
15     5     3      15    100          85

CodePudding user response:

Here is vectorised base R approach -

base_cols <- paste0("base_g00", 27:43)
allow_cols <- paste0("allow_m00", 20:36)
new_cols <- paste0("annual_util", 1:17)
y07_13[new_cols] <- y07_13[base_cols] - y07_13[allow_cols]
y07_13
  •  Tags:  
  • Related