I have a data frame and for one of the variables I need to split each of the observations by “,”
I used:
y <- strsplit(as.character(x), “,”)
I get a data set that shows every split character in a row not in the same row they were in before
I have this: “a,b,c,d…” And need this: “a” “b” “c”… For each row
CodePudding user response:
strsplit returns a list of vectors. If we have elements with different number of ,, the lengths of the list will be different. In that case, pad NA at the end (general case) based on the maximum lengths of the list and then rbind to create a matrix in base R
# assuming the data.frame object name as 'df1', split the column x
# by `,` followed by zero or more spaces `\\s*`)
lst1 <- with(df1, strpslit(as.character(x), ",\\s*"))
# find the max lengths of the list
mx <- max(lengths(lst1))
# pad NA at the end for elements with lesser length `length<-`
# and rbind the list elements
out <- do.call(rbind, lapply(lst1, `length<-`, mx))
This can also be done with tidyverse after splitting into a list
library(dplyr)
library(tidyr)
df1 %>%
mutate(y = strsplit(as.character(x), ",\\s*")) %>%
unnest_wider(y, names_sep = "")
CodePudding user response:
you could use separate() from tidyr() and dplyr()
library(tidyr)
library(dplyr)
#Create data
data <- tibble(rep(c("a,b,c", "ab,c", "cb,a"),5)) %>%
set_names("var1")
data %>%
separate(var1, into = c("var2", "var3", "var4"), #Names of new columns
sep = ",", #Specify to separate at comma
fill = "right", #Pad remaining side with NA
remove = FALSE) #Keep original variable
