Suppose I have a data frame like:
set.seed(123)
df <- data.frame(x=rbinom(100,1,0.9), y=rbinom(100,1,0.95))
What I wanted is to sample a subset,df_sub, from df where the number of rows with both x==1 and y==1 equals 5 regardless the total number of rows of df_sub like:
## index <- sample(1:nrow(df),..,replace = FALSE)
df_sub <- df[index,]
df_sub
x y
1 1 1
2 1 1
3 1 1
4 1 0
5 0 1
6 1 1
7 1 1
As you can see, in the df_sub, the number of rows with x==1 & y==1 equals 5 while the total number of rows equals 7. I would like to sample the original df with fixed number of 5 with x==1 & y==1 regardless the actual number of row of df_sub.
CodePudding user response:
We may use rep with sample
n_events <- 20
total_len <- 70
n_zero_events <- total_len - n_events
v1 <- sample(rep(c(1, 0), c(n_events, n_zero_events)))
> sum(v1)
[1] 20
CodePudding user response:
A base R one-liner using sample rep replace
> sample(replace(rep(0, 100), 1:20, 1))
[1] 0 1 0 0 1 0 0 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
[38] 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0
[75] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0
