Hello I need your support to join rows with the same name together and remove NA. In case of columns with the same name, a new column is created with a subscript, or combine it together with a comma.
I have this example dataframe:
name<-c("John","John","John","Luis","Luis")
may<-c("a",NA,NA,"a",NA)
june<-c(NA,"b",NA,NA,"a")
july<-c("d",NA,"c",NA,NA)
df<-data.frame(name,may,june,july)
having the following dataframe:
name may june july
1 John a <NA> d
2 John <NA> b <NA>
3 John <NA> <NA> c
4 Luis a <NA> <NA>
5 Luis <NA> a <NA>
I expect a result like the following:
name may june july july.2
1 John a b c d
2 Luis a a <NA> <NA>
or like the following:
name may june july
1 John a b c,d
2 Luis a a <NA>
CodePudding user response:
We can use summarize to concatenate strings together under the same "name".
In summarize(), if all records in the same column are NA, we fill that record with NA. If not, concatenate the strings without NA.
df %>%
group_by(name) %>%
summarize(across(everything(), ~ifelse(sum(is.na(.x)) == n(), NA, paste0(na.omit(sort(.x)), collapse = ","))))
# A tibble: 2 × 4
name may june july
<chr> <chr> <chr> <chr>
1 John a b c,d
2 Luis a a NA
