Here's a data frame called results:
results <- data.frame(cbind(tot=c(3,4,3,2,1,1,3,0),
a=c(0,1,2,2,0,1,1,0),
b=c(3,3,1,0,1,0,2,0)))
Each row includes a value of tot equal to the sum of a and b.
I need to create vectors a_prop_tweak and b_prop_tweak and append them to this data frame. If a, and b are greater than zero, these two new variables are computed one way (for a, I divide a by tot and add 0.025; for b, I divide b by tot and subtract 0.025). I compute the values differently if only a is zero (b_prop_tweak=b/ntot; a_prop_tweak=0). If only b is zero, the two variables should be computed another way (a_prop_tweak=a/ntot; b_prop_tweak=0). If both a and b are zero, the new variables should also equal zero.
Here's what the revised data frame result should look like:
results <- data.frame(cbind(tot=c(3, 4, 3, 2, 1, 1, 3, 0),
a=c(0, 1, 2, 2, 0, 1, 1, 0),
b=c(3, 3, 1, 0, 1, 0, 2, 0),
a_prop_tweak=c(0, 0.275, 0.6916667, 1, 0, 1, 0.3583333, 0),
b_prop_tweak=c(1, 0.725, 0.3083333, 0, 1, 0, 0.6416667, 0)))
Note that a_prop_tweak and b_prop_tweak will sum to 1 unless tot equals zero.
The incorrect code I wrote to accomplish this task is working in a way I do not intend:
if(results$a > 0 && results$b > 0){
results$a_prop_tweak <- results$a / results$tot 0.025
results$b_prop_tweak <- results$b / results$tot - 0.025
}else if(results$a > 0 && results$b == 0){
results$a_prop_tweak <- results$a / results$tot
results$b_prop_tweak <- 0
}else if(results$a == 0 && results$b > 0){
results$a_prop_tweak <- 0
results$b_prop_tweak <- results$b / results$tot
}else{
results$a_prop_tweak <- 0
results$b_prop_tweak <- 0
}
Here's the output, which appears to correctly compute b_prop_tweak (except when tot, a, and b are all zero):
> results
tot a b a_prop_tweak b_prop_tweak ab_prop_chk
1 3 0 3 0 1.0000000 1
2 4 1 3 0 0.7500000 1
3 3 2 1 0 0.3333333 1
4 2 2 0 0 0.0000000 1
5 1 0 1 0 1.0000000 1
6 1 1 0 0 0.0000000 1
7 3 1 2 0 0.6666667 1
8 0 0 0 0 NaN 0
I'm clearly thinking through this incorrectly. Any thoughts?
CodePudding user response:
A solution with dplyr, using rowwise and case_when
library(dplyr)
results %>%
rowwise() %>%
mutate( a_prop_tweak=case_when(
a > 0 & b > 0 ~ (a/tot) 0.025,
a == 0 & b != 0 ~ 0,
a != 0 & b == 0 ~ a/tot,
a == 0 & b == 0 ~ 0 ),
b_prop_tweak=case_when(
a > 0 & b > 0 ~ (b/tot) - 0.025,
a == 0 & b != 0 ~ b/tot,
a != 0 & b == 0 ~ 0,
a == 0 & b == 0 ~ 0 ) ) %>%
ungroup()
# A tibble: 8 × 5
tot a b a_prop_tweak b_prop_tweak
<dbl> <dbl> <dbl> <dbl> <dbl>
1 3 0 3 0 1
2 4 1 3 0.275 0.725
3 3 2 1 0.692 0.308
4 2 2 0 1 0
5 1 0 1 0 1
6 1 1 0 1 0
7 3 1 2 0.358 0.642
8 0 0 0 0 0
CodePudding user response:
As others have noted, try testing what gets returned with a statement like results$a>0. You'll be looking at the entire column, not just each row at a time. I would handle this by subsetting the columns just to the particular cases you're interested in:
#Create the data frame
results <- data.frame(cbind(tot=c(3,4,3,2,1,1,3,0),
a=c(0,1,2,2,0,1,1,0),
b=c(3,3,1,0,1,0,2,0)))
#create the new columns and initialize to 0
results$a_prop_tweak <- 0
results$b_prop_tweak <- 0
#Deal with cases where both a and b are >0
results$a_prop_tweak[results$a >0 & results$b >0] <- results$a[results$a >0 & results$b >0] /
results$tot[results$a >0 & results$b >0] 0.025
results$b_prop_tweak[results$a >0 & results$b >0] <- results$b[results$a >0 & results$b >0] /
results$tot[results$a >0 & results$b >0] -0.025
#If a>0 but b==0:
results$a_prop_tweak[results$a >0 & results$b == 0] <- results$a[results$a >0 & results$b == 0] /
results$tot[results$a >0 & results$b == 0]
#No need for a b_prop_tweak since it's already 0 by default
#If a==0 and b>0
results$b_prop_tweak[results$a == 0 & results$b > 0] <- results$b[results$a == 0 & results$b > 0] /
results$tot[results$a == 0 & results$b > 0]
CodePudding user response:
It sometimes helps to carefully step through your code. You are evaluating vectors in stead of the row per row evaluation you are trying to accomplish. And by doing this you always end up in this part of the if statement:
else if(results$a == 0 && results$b > 0){
results$a_prop_tweak <- 0
results$b_prop_tweak <- results$b / results$tot
There are quite a few ways to do what you want to do, I'll try and post one of them later. Just wanted to show you what is going wrong.
> results$a
[1] 0 1 2 2 0 1 1 0
> results$b
[1] 3 3 1 0 1 0 2 0
> results$a > 0 && results$b
[1] FALSE
> results$a > 0 && results$b
[1] FALSE
> results$a == 0 && results$b > 0
[1] TRUE
