I've seen examples here that show how to split a dataframe using column index ranges but how do I split my dataframe with dplyr using strings found in the column? I've purposely created my dataframes so the columns have certain strings in them for future splitting.
Example data:
Site A_Argas A_Arachnicea A_Brus B_Burkoll B_Brielle B_Bact
1 10 0 0 0 0 0
2 0 0 0 10 22 123
3 1 2 3 88 12 546
I want to split this dataframe up based on strings such as "A_" or "B_" and assign them to new dataframes.
For example the output would be:
dataframeA
Site A_Argas A_Arachnicea A_Brus
1 10 0 0
2 0 0 0
3 1 2 3
dataframeB
Site B_Burkoll B_Brielle B_Bact
1 0 0 0
2 10 22 123
3 88 12 546
Because this data is not in long format, I can't seem to figure how to change my old code that I used to split the longform dataframes (for a different analysis).
dataframeA <- data %>% filter("GroupID" == "Arachnids") # where "A_" in column headers signify arachnid species
dataframeB <- data %>% filter("GroupID" == "Bacteria") # where all "B_" in the column headers are bacterial species
CodePudding user response:
One base R option might be to grep for the columns starting with A_ and B_, then to subset your original data frame.
a_names <- grep("^A_", names(data), value=TRUE)
dataframeA <- data[ , c("Site", a_names)]
b_names <- grep("^B_", names(data), value=TRUE)
dataframeB <- data[ , c("Site", b_names)]
CodePudding user response:
Another possible solution, based on group_split, pivot_longer, pivot_wider and map:
library(tidyverse)
df <- structure(list(Site = c(1, 2, 3), A_Argas = c(10, 0, 1), A_Arachnicea = c(0,
0, 2), A_Brus = c(0, 0, 3), B_Burkoll = c(0, 10, 88), B_Brielle = c(0,
22, 12), B_Bact = c(0, 123, 546)), row.names = c(NA, -3L), class = "data.frame")
df %>%
pivot_longer(-1) %>%
group_split(str_extract(name, "^[A-Z]\\_")) %>%
map(~ pivot_wider(.x, id_cols = Site))
#> [[1]]
#> # A tibble: 3 × 4
#> Site A_Argas A_Arachnicea A_Brus
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 10 0 0
#> 2 2 0 0 0
#> 3 3 1 2 3
#>
#> [[2]]
#> # A tibble: 3 × 4
#> Site B_Burkoll B_Brielle B_Bact
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 0 0
#> 2 2 10 22 123
#> 3 3 88 12 546
CodePudding user response:
We may use split.default for this in base R, remove the first column ('Site'), split the data into a list of dataframe by removing the substring from the _ (using trimws), cbind the first column in the list of data.frames and change the list name
lst1 <- split.default(data[-1], trimws(names(data)[-1], whitespace = "_.*"))
lst2 <- Map(cbind, Site = data['Site'], lst1)
names(lst2) <- paste0("dataframe", names(lst1))
-output
> lst2
$dataframeA
Site A_Argas A_Arachnicea A_Brus
1 1 10 0 0
2 2 0 0 0
3 3 1 2 3
$dataframeB
Site B_Burkoll B_Brielle B_Bact
1 1 0 0 0
2 2 10 22 123
3 3 88 12 546
It may be better to keep it in a list and access the elements as
lst2[["dataframeA"]]
lst2$dataframeA
instead of creating multiple objects (although can be created with list2env
list2env(lst2, .GlobalEnv)
With tidyverse, we could loop over the substring of the column names and select the relevant columns with map
library(purrr)
library(dplyr)
library(stringr)
str_remove(names(data)[-1] , "_.*") %>%
unique %>%
map(~ data %>%
select(Site, starts_with(.x)))
-output
[[1]]
Site A_Argas A_Arachnicea A_Brus
1 1 10 0 0
2 2 0 0 0
3 3 1 2 3
[[2]]
Site B_Burkoll B_Brielle B_Bact
1 1 0 0 0
2 2 10 22 123
3 3 88 12 546
data
data <- structure(list(Site = 1:3, A_Argas = c(10L, 0L, 1L), A_Arachnicea = c(0L,
0L, 2L), A_Brus = c(0L, 0L, 3L), B_Burkoll = c(0L, 10L, 88L),
B_Brielle = c(0L, 22L, 12L), B_Bact = c(0L, 123L, 546L)),
class = "data.frame", row.names = c(NA,
-3L))
