df is a dataframe in which I need to group together the rows having identical elements in the Name column. Finally the duplicated elements in the Name column are to be removed.
df <- data.frame(Name = c("A","","","B","","","A","","","B","",""),
Test = c("test1","test2","test3","test1","test2","test3",
"test1.1","test2.1","test3.1","test1.1","test2.1","test3.1"))
Desired output:
> df
Name Test
1 A test1
2 test2
3 test3
4 test1.1
5 test2.1
6 test3.1
7 B test1
8 test2
9 test3
10 test1.1
11 test2.1
12 test3.1
CodePudding user response:
You could try the following with tidyverse. Replace empty character values with NA and fill down with the Name value. Then, sort by Name. Finally, keep only the first Name in a group.
library(tidyverse)
df %>%
mutate(Name = na_if(Name, "")) %>%
fill(Name, .direction = "down") %>%
arrange(match(Name, unique(df$Name))) %>%
group_by(Name) %>%
mutate(Name = ifelse(row_number() == 1, Name, ""))
Output
Name Test
<chr> <chr>
1 "A" test1
2 "" test2
3 "" test3
4 "" test1.1
5 "" test2.1
6 "" test3.1
7 "B" test1
8 "" test2
9 "" test3
10 "" test1.1
11 "" test2.1
12 "" test3.1
CodePudding user response:
Here is one option with na.locf and arrange
library(dplyr)
library(zoo)
df %>%
arrange(na.locf(na_if(Name, ""))) %>%
mutate(Name = replace(Name, duplicated(Name) & Name != "", ""))
-output
Name Test
1 A test1
2 test2
3 test3
4 test1.1
5 test2.1
6 test3.1
7 B test1
8 test2
9 test3
10 test1.1
11 test2.1
12 test3.1
