I need to split characters in R for every observation of a variable-CodePudding

I have a data frame and for one of the variables I need to split each of the observations by “,”

I used:

y <- strsplit(as.character(x), “,”)

I get a data set that shows every split character in a row not in the same row they were in before

I have this: “a,b,c,d…” And need this: “a” “b” “c”… For each row

CodePudding user response：

strsplit returns a list of vectors. If we have elements with different number of ,, the lengths of the list will be different. In that case, pad NA at the end (general case) based on the maximum lengths of the list and then rbind to create a matrix in base R

# assuming the data.frame object name as 'df1', split the column x
# by `,` followed by zero or more spaces `\\s*`)
lst1 <- with(df1, strpslit(as.character(x), ",\\s*"))
# find the max lengths of the list
mx <- max(lengths(lst1))
# pad NA at the end for elements with lesser length `length<-`
# and rbind the list elements 
out <- do.call(rbind, lapply(lst1, `length<-`, mx))

This can also be done with tidyverse after splitting into a list

library(dplyr)
library(tidyr)
df1 %>%
    mutate(y = strsplit(as.character(x), ",\\s*")) %>%
    unnest_wider(y, names_sep = "")

CodePudding user response：

you could use separate() from tidyr() and dplyr()

library(tidyr)
library(dplyr)

#Create data
data <-  tibble(rep(c("a,b,c", "ab,c", "cb,a"),5)) %>% 
            set_names("var1")

data %>% 
    separate(var1, into = c("var2", "var3", "var4"),   #Names of new columns
             sep = ",",   #Specify to separate at comma
             fill = "right",   #Pad remaining side with NA
             remove = FALSE)  #Keep original variable