I have a dataframe that looks like
df <- structure(list(Variable = c("Factor1", "Factor2", "Factor3"),
Variable1 = c("word1, word2", "word1", "word1"),
Variable2 = c("word1", "word1, word2", "word1"),
Variable3 = c("word1, word2", "word1", "word1, word2, word3")),
row.names = c(NA, -3L), class = "data.frame")
and would like to create a df that counts occurrences of words in each cell (separated by ",") and input the number into each cell.
df2 <- structure(list(Variable = c("Factor1", "Factor2", "Factor3"),
Variable1 = c("2", "1", "1"),
Variable2 = c("1", "2", "1"),
Variable3 = c("2", "1", "3")),
row.names = c(NA, -3L), class = "data.frame")
Would someone be able to help me in how this would be done?
Thanks!
CodePudding user response:
Using dplyr and stringi:
df %>%
mutate(across(matches("variable\\d{1,}"),stringi::stri_count_words))
Variable Variable1 Variable2 Variable3
1 Factor1 2 1 2
2 Factor2 1 2 1
3 Factor3 1 1 3
CodePudding user response:
I suppose you could try this if desired a base-R solution. Count the number of characters with nchar of a given character value, and subtract the number of characters after removing commas. The difference would be the number of commas (adding 1 would give the number of words/phrases separated by commas). This should be fast too (also see this answer).
cbind(df[1], t(apply(df[-1], 1, \(x) {
nchar(x) - nchar(gsub(",", "", x, fixed = T)) 1
})))
Output
Variable Variable1 Variable2 Variable3
1 Factor1 2 1 2
2 Factor2 1 2 1
3 Factor3 1 1 3
