I have a dataframe
df <- data.frame(id1 = c("a" , "b", "b", "c"),
id2 = c(NA,"a","a",NA),
id3 = c("a", "a", "a", "e"),
n1 = c(2,2,2,3),
n2 = c(2,1,1,1),
n3 = c(0,1,1,3),
n4 = c(0,1,1,2))
I want to collapse the 2nd and 3rd rows into one. Afterwards, I will do aggregate by column id3 sharing same character (i.e. a).
My real dataframe is long contaning many different latin names, filter by name i.e. a doesn´t make sense this case. I am thinking to collapse rows with the condition id3 == id2, but I could not do it. Any sugesstions for me?
My desired out put like this
id1 id2 id3 n1 n2 n3 n4
a NA a 2 2 0 0
b a a 2 1 1 1
c NA e 3 1 3 2
#Afterthat, it should be
id1 id3 n1 n2 n3 n4
a a 4 3 1 1
c e 3 1 3 2
(I just updated the dataframe, sorry for my mistake)
CodePudding user response:
We get the distinct rows to generate the first expected
library(dplyr)
df %>%
distinct
id1 id2 id3 n1 n2 n3 n4
1 a <NA> a 2 2 0 0
2 b a a 2 1 1 1
3 c <NA> e 3 1 3 2
The final output we can get from the above, i.e. after the distinct step, do a group by coalesced 'id2', 'id1' along with 'id3' and then get the sum of numeric columns
df %>%
distinct %>%
group_by(id1 = coalesce(id2, id1), id3) %>%
summarise(across(where(is.numeric), sum), .groups = 'drop')
-output
# A tibble: 2 × 6
id1 id3 n1 n2 n3 n4
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a a 4 3 1 1
2 c e 3 1 3 2
CodePudding user response:
Here is a slightly different way using slice after group_by instead of distinct:
df %>%
group_by(id1, id3) %>%
dplyr::slice(1L) %>%
mutate(id1 = coalesce(id2,id1)) %>%
summarise(across(where(is.numeric), sum))
output:
id1 id3 n1 n2 n3 n4
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 a a 4 3 1 1
2 c e 3 1 3 2
