My current dataset looks something like this:
Col1 Col2 Col3
1 b c
2 c
3 B
I want to add a new column that returns everything in the row separated by commas, skipping blank cells. Example:
Col1 Col2 Col3 newCol
1 b c 1, b, c
2 c 2, c
3 B 3, B
I've tried df$newCol <- apply(df[,1:3], 1, paste, collapse=", "), but it returns the new column with extra columns:
newCol
1, b, c
2, , c
3, B,
CodePudding user response:
Here we use unite function after replacing blank cells with NA.
At the end we replace back to blank cells:
- Main feature is to use
na.rmargument ofunite
library(dplyr)
library(tidyr)
df %>%
mutate(across(everything(), na_if,"")) %>%
unite(newCol, Col1:Col3, na.rm = TRUE, sep = ", ", remove = FALSE) %>%
mutate(across(everything(), ~replace_na(., ""))) %>%
relocate(newCol, .after = Col3)
Col1 Col2 Col3 newCol
1 1 b c 1, b, c
2 2 c 2, c
3 3 B 3, B
CodePudding user response:
Using do.call with paste
df$newCol <- gsub("\\s ,\\s ", " ", trimws(do.call(paste,
c(df, sep=", ")), whitespace =",\\s*"))
CodePudding user response:
A possible solution, when your empty cells are "":
df <- data.frame(
stringsAsFactors = FALSE,
Col1 = c(1L, 2L, 3L),
Col2 = c("b", "", "B"),
Col3 = c("c", "c", "")
)
df$newCol <- apply(df, 1, function(x) paste(x[x != ""], collapse=", "))
df
#> Col1 Col2 Col3 newCol
#> 1 1 b c 1, b, c
#> 2 2 c 2, c
#> 3 3 B 3, B
A possible solution, which uses na.omit when your empty cells are NA:
df <- data.frame(
stringsAsFactors = FALSE,
Col1 = c(1L, 2L, 3L),
Col2 = c("b", NA, "B"),
Col3 = c("c", "c", NA)
)
df$Col4 <- apply(df, 1, function(x) paste(na.omit(x), collapse = ", "))
df
#> Col1 Col2 Col3 Col4
#> 1 1 b c 1, b, c
#> 2 2 <NA> c 2, c
#> 3 3 B <NA> 3, B
CodePudding user response:
We can try toString nzchar
> df$newCol <- apply(df, 1, function(v) toString(v[nzchar(v)]))
> df
Col1 Col2 Col3 newCol
1 1 b c 1, b, c
2 2 c 2, c
3 3 B 3, B
CodePudding user response:
Here is a data.table, do.call, and paste option:
library(data.table)
setDT(df)[, newCol := gsub("\\s ,\\s ", " ",
gsub(", $", "", do.call(paste, c(.SD, sep = ", "))))]
Output
Col1 Col2 Col3 newCol
1: 1 b c 1, b, c
2: 2 c 2, c
3: 3 B 3, B
