How to add a calculated column to a data frame in the middle of a series of joins of other data fram-CodePudding

Suppose we perform a series of left_join() of data frames as shown below (please don't worry about appropriateness of left_join(), this is a simple example drawn from larger code which does require left_join()):

library(dplyr)

mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))
    
minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))
    
equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))
    
mydat %>%
      left_join(minus_B) %>%
      left_join(equals_D)
    
  period plusA minusB equalsD
1      1    10      8      18
2      2    20     18      28
3      3    12     10      20
4      4    13     11      21
5      5    17     18      28
6      6    19      9      19
7      7    20     12      22
8      8    16      4      14

I'm trying to add a balancing column, called "adjC", between the "minusB" and "equalsD" columns, that would leave a data frame output of:

  period plusA minusB  adjC equalsD  [Explain column adjC]
1      1    10      8    16      18   18 - (10 - 8) = 16
2      2    20     18    26      28   28 - (20 - 18) = 26
3      3    12     10    18      20   etc.
4      4    13     11    19      21
5      5    17     18    27      28
6      6    19      9     9      19
7      7    20     12    16      22
8      8    16      4     2      14

I've been fooling around with mydat %>% mutate(adjust_C = equalsD - (plusA - minusB)), or inserting it into the middle of the above left_join() series, and it doesn't work.

Any ideas how to do this? I'm trying to stick with dplyr too.

CodePudding user response：

The issue was that the OP used mutate directly on the original dataset which was not updated

library(dplyr)
mydat %>% 
       left_join(minus_B) %>%  
       left_join(equals_D) %>% 
       mutate(adjC = equalsD - (plusA - minusB), .before = 'equalsD')

-output

   period plusA minusB adjC equalsD
1      1    10      8   16      18
2      2    20     18   26      28
3      3    12     10   18      20
4      4    13     11   19      21
5      5    17     18   29      28
6      6    19      9    9      19
7      7    20     12   14      22
8      8    16      4    2      14

CodePudding user response：

Not sure are you looking for both, adjC and explain_adjC?:

library(dplyr)

mydat %>%
  left_join(minus_B) %>%
  left_join(equals_D) %>% 
  mutate(adjC = equalsD - (plusA - minusB),
         .before = "equalsD") %>% 
  mutate(explain_adjC = paste0(equalsD, ' - (', plusA, ' - ', minusB, ') = ', adjC)
# alternatively with {glue}:
# mutate(explain_adjC = glue::glue("{equalsD} - ({plusA} - {minusB}) = {adjC}"))
)

#> Joining, by = "period"
#> Joining, by = "period"
#>   period plusA minusB adjC equalsD        explain_adjC
#> 1      1    10      8   16      18  18 - (10 - 8) = 16
#> 2      2    20     18   26      28 28 - (20 - 18) = 26
#> 3      3    12     10   18      20 20 - (12 - 10) = 18
#> 4      4    13     11   19      21 21 - (13 - 11) = 19
#> 5      5    17     18   29      28 28 - (17 - 18) = 29
#> 6      6    19      9    9      19   19 - (19 - 9) = 9
#> 7      7    20     12   14      22 22 - (20 - 12) = 14
#> 8      8    16      4    2      14   14 - (16 - 4) = 2


# data
mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))

minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))

equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))

^{Created on 2022-01-10 by the reprex package (v2.0.1)}

CodePudding user response：

This is admittedly a bit of a brute force solution, but why not just use base R syntax after your joins? I think the behavior is an artifact of the joining in dplyr::. Simply make your joins a dataframe, and create a new column. Remember that R doesn't really care about column order for your data structure, so you can simply reorder your columns as desired. Here is a reproducible example:

library(dplyr)

mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))

minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))

equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))

JOINED<-data.frame(mydat %>%
  left_join(minus_B) %>%
  left_join(equals_D)) 

JOINED[,"adj_C"] <- (JOINED$equalsD - (JOINED$plusA - JOINED$minusB))
out<-JOINED[,c(1:3, 5, 4)]
out