Suppose we perform a series of left_join() of data frames as shown below (please don't worry about appropriateness of left_join(), this is a simple example drawn from larger code which does require left_join()):
library(dplyr)
mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))
minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))
equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))
mydat %>%
left_join(minus_B) %>%
left_join(equals_D)
period plusA minusB equalsD
1 1 10 8 18
2 2 20 18 28
3 3 12 10 20
4 4 13 11 21
5 5 17 18 28
6 6 19 9 19
7 7 20 12 22
8 8 16 4 14
I'm trying to add a balancing column, called "adjC", between the "minusB" and "equalsD" columns, that would leave a data frame output of:
period plusA minusB adjC equalsD [Explain column adjC]
1 1 10 8 16 18 18 - (10 - 8) = 16
2 2 20 18 26 28 28 - (20 - 18) = 26
3 3 12 10 18 20 etc.
4 4 13 11 19 21
5 5 17 18 27 28
6 6 19 9 9 19
7 7 20 12 16 22
8 8 16 4 2 14
I've been fooling around with mydat %>% mutate(adjust_C = equalsD - (plusA - minusB)), or inserting it into the middle of the above left_join() series, and it doesn't work.
Any ideas how to do this? I'm trying to stick with dplyr too.
CodePudding user response:
The issue was that the OP used mutate directly on the original dataset which was not updated
library(dplyr)
mydat %>%
left_join(minus_B) %>%
left_join(equals_D) %>%
mutate(adjC = equalsD - (plusA - minusB), .before = 'equalsD')
-output
period plusA minusB adjC equalsD
1 1 10 8 16 18
2 2 20 18 26 28
3 3 12 10 18 20
4 4 13 11 19 21
5 5 17 18 29 28
6 6 19 9 9 19
7 7 20 12 14 22
8 8 16 4 2 14
CodePudding user response:
Not sure are you looking for both, adjC and explain_adjC?:
library(dplyr)
mydat %>%
left_join(minus_B) %>%
left_join(equals_D) %>%
mutate(adjC = equalsD - (plusA - minusB),
.before = "equalsD") %>%
mutate(explain_adjC = paste0(equalsD, ' - (', plusA, ' - ', minusB, ') = ', adjC)
# alternatively with {glue}:
# mutate(explain_adjC = glue::glue("{equalsD} - ({plusA} - {minusB}) = {adjC}"))
)
#> Joining, by = "period"
#> Joining, by = "period"
#> period plusA minusB adjC equalsD explain_adjC
#> 1 1 10 8 16 18 18 - (10 - 8) = 16
#> 2 2 20 18 26 28 28 - (20 - 18) = 26
#> 3 3 12 10 18 20 20 - (12 - 10) = 18
#> 4 4 13 11 19 21 21 - (13 - 11) = 19
#> 5 5 17 18 29 28 28 - (17 - 18) = 29
#> 6 6 19 9 9 19 19 - (19 - 9) = 9
#> 7 7 20 12 14 22 22 - (20 - 12) = 14
#> 8 8 16 4 2 14 14 - (16 - 4) = 2
# data
mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))
minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))
equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))
Created on 2022-01-10 by the reprex package (v2.0.1)
CodePudding user response:
This is admittedly a bit of a brute force solution, but why not just use base R syntax after your joins? I think the behavior is an artifact of the joining in dplyr::. Simply make your joins a dataframe, and create a new column. Remember that R doesn't really care about column order for your data structure, so you can simply reorder your columns as desired. Here is a reproducible example:
library(dplyr)
mydat <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), plusA = c(10,20,12,13,17,19,20,16))
minus_B <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), minusB = c(8,18,10,11,18,9,12,4))
equals_D <- data.frame(period = c(1, 2, 3, 4, 5, 6, 7, 8), equalsD = c(18,28,20,21,28,19,22,14))
JOINED<-data.frame(mydat %>%
left_join(minus_B) %>%
left_join(equals_D))
JOINED[,"adj_C"] <- (JOINED$equalsD - (JOINED$plusA - JOINED$minusB))
out<-JOINED[,c(1:3, 5, 4)]
out
