In my data below, I wonder how to delete all rows with a given value of outcome (say "A") from n (say 1) randomly selected studyies?
The only condition is that we want to select only from studies that have used more than one value of outcome (e.g., study==1 and study==2 each of which has both outcome == "A" and outcome == "B").
For example, below let's say the given value of outcome is "A". Then, for a given n (say n = 1), we delete all rows with with outcome == "A" from n = 1 randomly selected study from study==1 or study==2.
Is this possible in R?
m =
"
study group outcome
1 1 1 A
2 1 1 B
3 1 2 A
4 1 2 B
5 2 1 A
6 2 1 B
7 2 2 A
8 2 2 B
9 3 1 B
10 4 1 B
"
data <- read.table(text=m,h=T)
CodePudding user response:
library(dplyr)
n = 1
studies_to_remove = sample(unique(data$study), size = n)
outcome_to_remove = "A"
data %>%
filter(
!(
study %in% studies_to_remove &
outcome %in% outcome_to_remove
)
)
# study group outcome
# 2 1 1 B
# 4 1 2 B
# 5 2 1 A
# 6 2 1 B
# 7 2 2 A
# 8 2 2 B
# 9 3 1 B
# 10 4 1 B
