I have this type of data:
df <- data.frame(name = c("Acer laurinum", "Acer laurinum Hassk.", "Acmella paniculata",
"Adinandra cf. integerrima", "Adinandra cf. integerrima T.Anderson"),
value1 = c(1,2,3,4,5),
value2 = c(2,3,4,5,6))
I want to summarise columns value1 and value2 based on the matched parts of column nameand keep the unique values of the new column author. This code only does the summarising part but author is gone:
df %>%
mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w] $"),
name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w] $"))) %>%
group_by(name1) %>%
summarise(across(c(value1, value2), sum))
# A tibble: 3 x 3
name1 value1 value2
* <chr> <dbl> <dbl>
1 Acer laurinum 3 5
2 Acmella paniculata 3 4
3 Adinandra cf. integerrima 9 11
Expected output:
# A tibble: 3 x 3
name1 value1 value2 author
* <chr> <dbl> <dbl> <chr>
1 Acer laurinum 3 5 Hassk.
2 Acmella paniculata 3 4 <NA>
3 Adinandra cf. integerrima 9 11 T.Anderson
CodePudding user response:
You may use na.omit(author)[1] to get 1st non NA value of author in the group.
library(dplyr)
library(stringr)
df %>%
mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w] $"),
name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w] $"))) %>%
group_by(name1) %>%
summarise(across(c(value1, value2), sum),
author = na.omit(author)[1])
# name1 value1 value2 author
# <chr> <dbl> <dbl> <chr>
#1 Acer laurinum 3 5 Hassk.
#2 Acmella paniculata 3 4 NA
#3 Adinandra cf. integerrima 9 11 T.Anderson
