I'd like to split my dataset using the variable group and then remove that variable from the resulting dataset. Right now, I'm using a for loop, but I'm looking for something that avoids a loop and something in base R without loading dplyr or a similar package.
n <- 10
x <- runif(n)*10
y <- runif(n)*10
group <- rep(1:2, each=5)
my_data <- as.data.frame(cbind(group, x, y))
subset_data <- split(my_data, my_data$group, drop=TRUE)
drop_column <- "group"
for (i in 1:length(unique(group))){
subset_data[[i]] <- subset_data[[i]][,!(names(subset_data[[i]]) %in% drop_column)]
}
Thank you.
CodePudding user response:
A base R option using subset inside lapply. You can use split and remove the grouping variable all in one step.
lapply(split(my_data, my_data$group, drop=TRUE), subset, select = -group)
Output
$`1`
x y
1 3.421037 0.2846179
2 9.219159 5.0449367
3 4.157628 1.3970608
4 3.412703 2.2196774
5 9.948763 6.5528746
$`2`
x y
6 0.3746215 3.4387533
7 3.0722134 0.5371084
8 3.0580508 0.4649525
9 3.6308661 6.5796197
10 6.4435513 3.0641620
CodePudding user response:
You can use group_split from dplyr and sett the keep parameter to FALSE:
library(dplyr)
subset_data <- my_data |>
group_split(group, .keep = FALSE)
<list_of<
tbl_df<
x: double
y: double
>
>[2]>
[[1]]
# A tibble: 5 x 2
x y
<dbl> <dbl>
1 9.43 1.84
2 2.34 9.41
3 6.96 7.56
4 7.91 5.11
5 1.52 3.38
[[2]]
# A tibble: 5 x 2
x y
<dbl> <dbl>
1 2.71 6.14
2 0.959 8.13
3 0.0337 0.315
4 1.26 8.30
5 4.73 0.122
CodePudding user response:
The idea is borrowed from Delete a column in a data frame within a list
n <- 10
x <- runif(n)*10
y <- runif(n)*10
group <- rep(1:2, each=5)
my_data <- as.data.frame(cbind(group, x, y))
subset_data <- split(my_data, my_data$group, drop=TRUE)
lapply(subset_data, function(x) x[!(names(x) %in% "group")])
$`1`
x y
1 3.323947 3.337749
2 6.508705 4.763512
3 2.580168 8.921983
4 4.785452 8.643395
5 7.663107 3.899895
$`2`
x y
6 0.8424691 7.773207
7 8.7532133 9.606180
8 3.3907294 4.346595
9 8.3944035 7.125147
10 3.4668349 3.999944
