I would like to update three columns simutaneously based on one column
My data looks like this
df <- data.frame(input = c("Antidesma cuspidatum Mull.Arg.", "Antidesma cuspidatum Müll.Arg.",
"Alchornea parviflora (Benth.) Mull.Arg.", "Alchornea parviflora (Benth.) Müll.Arg."),
n1 = c("Antidesma cuspidatum", NA, "Alchornea parviflora", NA),
n2 = c("Antidesma", NA, "Alchornea", NA),
n3 = c("Phyllanthaceae", NA, "Euphorbiaceae", NA))
input n1 n2 n3
1 Antidesma cuspidatum Mull.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
2 Antidesma cuspidatum Müll.Arg. <NA> <NA> <NA>
3 Alchornea parviflora (Benth.) Mull.Arg. Alchornea parviflora Alchornea Euphorbiaceae
4 Alchornea parviflora (Benth.) Müll.Arg. <NA> <NA> <NA>
I would like to ask if I find the first two strings of input column are the same , then the coresponding rows would be the same. It means that the value (2nd and 4th rows) of n1, n2, n3 in this example would be added by the value (1st and 3rd rows).
My desired output here
input n1 n2 n3
1 Antidesma cuspidatum Mull.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
2 Antidesma cuspidatum Müll.Arg. Antidesma cuspidatum Antidesma Phyllanthaceae
3 Alchornea parviflora (Benth.) Mull.Arg. Alchornea parviflora Alchornea Euphorbiaceae
4 Alchornea parviflora (Benth.) Müll.Arg. Alchornea parviflora Alchornea Euphorbiaceae
Any sugesstions for me this case?
CodePudding user response:
You can use the dplyr package.
First I create a column gr which contains only the first two strings of input. Then I change (or mutate) the columns n1, n2 and n3 by putting the non-NA value of that group there.
library(dplyr)
df %>%
group_by(gr = gsub("(^\\w \\w ) .*", "\\1", input)) %>%
mutate(across(c(n1, n2, n3), ~.x[!is.na(.x)][1])) %>%
ungroup()
