Concatenate 2 columns into a new column, skip blanks in R-CodePudding

My current dataset looks something like this:

Col1     Col2     Col3 
1        b        c           
2                 c      
3        B

I want to add a new column that returns everything in the row separated by commas, skipping blank cells. Example:

Col1     Col2     Col3     newCol
1        b        c        1, b, c   
2                 c        2, c
3        B                 3, B

I've tried df$newCol <- apply(df[,1:3], 1, paste, collapse=", "), but it returns the new column with extra columns:

newCol
1, b, c
2, , c
3, B,

CodePudding user response：

Here we use unite function after replacing blank cells with NA. At the end we replace back to blank cells:

Main feature is to use na.rm argument of unite

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(everything(), na_if,"")) %>% 
  unite(newCol, Col1:Col3, na.rm = TRUE, sep = ", ", remove = FALSE) %>% 
  mutate(across(everything(), ~replace_na(., ""))) %>% 
  relocate(newCol, .after = Col3)

  Col1 Col2 Col3  newCol
1    1    b    c 1, b, c
2    2         c    2, c
3    3    B         3, B

CodePudding user response：

Using do.call with paste

df$newCol <- gsub("\\s ,\\s ", " ", trimws(do.call(paste, 
     c(df, sep=", ")), whitespace =",\\s*"))

CodePudding user response：

A possible solution, when your empty cells are "":

df <- data.frame(
  stringsAsFactors = FALSE,
  Col1 = c(1L, 2L, 3L),
  Col2 = c("b", "", "B"),
  Col3 = c("c", "c", "")
)

df$newCol <- apply(df, 1, function(x) paste(x[x != ""], collapse=", "))
df

#>   Col1 Col2 Col3  newCol
#> 1    1    b    c 1, b, c
#> 2    2         c    2, c
#> 3    3    B         3, B

A possible solution, which uses na.omit when your empty cells are NA:

df <- data.frame(
  stringsAsFactors = FALSE,
  Col1 = c(1L, 2L, 3L),
  Col2 = c("b", NA, "B"),
  Col3 = c("c", "c", NA)
)

df$Col4 <- apply(df, 1, function(x) paste(na.omit(x), collapse = ", "))
df

#>   Col1 Col2 Col3    Col4
#> 1    1    b    c 1, b, c
#> 2    2 <NA>    c    2, c
#> 3    3    B <NA>    3, B

CodePudding user response：

We can try toString nzchar

> df$newCol <- apply(df, 1, function(v) toString(v[nzchar(v)]))

> df
  Col1 Col2 Col3  newCol
1    1    b    c 1, b, c
2    2         c    2, c
3    3    B         3, B

CodePudding user response：

Here is a data.table, do.call, and paste option:

library(data.table)

setDT(df)[, newCol := gsub("\\s ,\\s ", " ", 
                           gsub(", $", "", do.call(paste, c(.SD, sep = ", "))))]

Output

   Col1 Col2 Col3  newCol
1:    1    b    c 1, b, c
2:    2         c    2, c
3:    3    B         3, B