I have a df as below
df <- data.frame(col = c("Acer laurinum", "Acer laurinum"),
author = c("", "Hassk."),
value = c(1,1))
df
col author value1 value2
1 Acer laurinum 1 2
2 Acer laurinum Hassk. 1 3
3 Acronychia pedunculata (L.) Miq. 0 1
4 Acronychia pedunculata 0 0
I would like to update the author column saying that if I found any duplicated rows in col column, then the blank cell of author associated with the value (i.e. Acer laurinum) will be updated to Hassk. And the value must be retained.
- Desired output
col author value1 value2
1 Acer laurinum Hassk. 1 2
2 Acer laurinum Hassk. 1 3
3 Acronychia pedunculata (L.) Miq. 0 1
4 Acronychia pedunculata (L.) Miq. 0 0
Any suggestions for this?
CodePudding user response:
You should use dplyr::group_by and dplyr::mutate with max:
df %>% group_by(col) %>%
mutate(author=max(author))
Output:
col author value1 value2
<chr> <chr> <dbl> <dbl>
1 Acer laurinum Hassk. 1 2
2 Acer laurinum Hassk. 1 3
3 Acronychia pedunculata (L.) Miq. 0 1
4 Acronychia pedunculata (L.) Miq. 0 0
>
CodePudding user response:
# Base option 1:
df$author <- with(df, ave(author, col, FUN = max))
# Base option 2:
transform(
with(
df,
replace(df, df == "", NA_character_)[
order(col, author, decreasing = TRUE),
]
),
author = na.omit(author)[cumsum(!(is.na(author)))]
)[row.names(df),]
Data:
df <- structure(list(col = c("Acer laurinum", "Acer laurinum", "Acronychia pedunculata",
"Acronychia pedunculata"), author = c("", "Hassk.", "(L.) Miq.",
""), value1 = c(1, 1, 0, 0), value2 = c(2, 3, 1, 0)), class = "data.frame", row.names = c(NA,
-4L))
