Home > Back-end >  How to add a calculated column to a data frame in the middle of a series of joins of other data fram
How to add a calculated column to a data frame in the middle of a series of joins of other data fram

Time:01-11

Suppose we perform a series of left_join() of data frames as shown below (please don't worry about appropriateness of left_join(), this is a simple example drawn from larger code which does require left_join()):

library(dplyr)

mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))
    
minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))
    
equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))
    
mydat %>%
      left_join(minus_B) %>%
      left_join(equals_D)
    
  period plusA minusB equalsD
1      1    10      8      18
2      2    20     18      28
3      3    12     10      20
4      4    13     11      21
5      5    17     18      28
6      6    19      9      19
7      7    20     12      22
8      8    16      4      14

I'm trying to add a balancing column, called "adjC", between the "minusB" and "equalsD" columns, that would leave a data frame output of:

  period plusA minusB  adjC equalsD  [Explain column adjC]
1      1    10      8    16      18   18 - (10 - 8) = 16
2      2    20     18    26      28   28 - (20 - 18) = 26
3      3    12     10    18      20   etc.
4      4    13     11    19      21
5      5    17     18    27      28
6      6    19      9     9      19
7      7    20     12    16      22
8      8    16      4     2      14

I've been fooling around with mydat %>% mutate(adjust_C = equalsD - (plusA - minusB)), or inserting it into the middle of the above left_join() series, and it doesn't work.

Any ideas how to do this? I'm trying to stick with dplyr too.

CodePudding user response:

The issue was that the OP used mutate directly on the original dataset which was not updated

library(dplyr)
mydat %>% 
       left_join(minus_B) %>%  
       left_join(equals_D) %>% 
       mutate(adjC = equalsD - (plusA - minusB), .before = 'equalsD')

-output

   period plusA minusB adjC equalsD
1      1    10      8   16      18
2      2    20     18   26      28
3      3    12     10   18      20
4      4    13     11   19      21
5      5    17     18   29      28
6      6    19      9    9      19
7      7    20     12   14      22
8      8    16      4    2      14

CodePudding user response:

Not sure are you looking for both, adjC and explain_adjC?:

library(dplyr)

mydat %>%
  left_join(minus_B) %>%
  left_join(equals_D) %>% 
  mutate(adjC = equalsD - (plusA - minusB),
         .before = "equalsD") %>% 
  mutate(explain_adjC = paste0(equalsD, ' - (', plusA, ' - ', minusB, ') = ', adjC)
# alternatively with {glue}:
# mutate(explain_adjC = glue::glue("{equalsD} - ({plusA} - {minusB}) = {adjC}"))
)

#> Joining, by = "period"
#> Joining, by = "period"
#>   period plusA minusB adjC equalsD        explain_adjC
#> 1      1    10      8   16      18  18 - (10 - 8) = 16
#> 2      2    20     18   26      28 28 - (20 - 18) = 26
#> 3      3    12     10   18      20 20 - (12 - 10) = 18
#> 4      4    13     11   19      21 21 - (13 - 11) = 19
#> 5      5    17     18   29      28 28 - (17 - 18) = 29
#> 6      6    19      9    9      19   19 - (19 - 9) = 9
#> 7      7    20     12   14      22 22 - (20 - 12) = 14
#> 8      8    16      4    2      14   14 - (16 - 4) = 2


# data
mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))

minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))

equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))

Created on 2022-01-10 by the reprex package (v2.0.1)

CodePudding user response:

This is admittedly a bit of a brute force solution, but why not just use base R syntax after your joins? I think the behavior is an artifact of the joining in dplyr::. Simply make your joins a dataframe, and create a new column. Remember that R doesn't really care about column order for your data structure, so you can simply reorder your columns as desired. Here is a reproducible example:

library(dplyr)

mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))

minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))

equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))

JOINED<-data.frame(mydat %>%
  left_join(minus_B) %>%
  left_join(equals_D)) 

JOINED[,"adj_C"] <- (JOINED$equalsD - (JOINED$plusA - JOINED$minusB))
out<-JOINED[,c(1:3, 5, 4)]
out
  •  Tags:  
  • Related