Grouping of rows having same elements in a column of a dataframe-CodePudding

df is a dataframe in which I need to group together the rows having identical elements in the Name column. Finally the duplicated elements in the Name column are to be removed.

df <- data.frame(Name = c("A","","","B","","","A","","","B","",""),
                 Test = c("test1","test2","test3","test1","test2","test3",
                       "test1.1","test2.1","test3.1","test1.1","test2.1","test3.1"))

Desired output:

> df
   Name    Test
1     A   test1
2         test2
3         test3
4       test1.1
5       test2.1
6       test3.1
7     B   test1
8         test2
9         test3
10      test1.1
11      test2.1
12      test3.1

CodePudding user response：

You could try the following with tidyverse. Replace empty character values with NA and fill down with the Name value. Then, sort by Name. Finally, keep only the first Name in a group.

library(tidyverse)

df %>%
  mutate(Name = na_if(Name, "")) %>%
  fill(Name, .direction = "down") %>%
  arrange(match(Name, unique(df$Name))) %>%
  group_by(Name) %>%
  mutate(Name = ifelse(row_number() == 1, Name, ""))

Output

   Name  Test   
   <chr> <chr>  
 1 "A"   test1  
 2 ""    test2  
 3 ""    test3  
 4 ""    test1.1
 5 ""    test2.1
 6 ""    test3.1
 7 "B"   test1  
 8 ""    test2  
 9 ""    test3  
10 ""    test1.1
11 ""    test2.1
12 ""    test3.1

CodePudding user response：

Here is one option with na.locf and arrange

library(dplyr)
library(zoo)
df %>%
  arrange(na.locf(na_if(Name, ""))) %>%
  mutate(Name = replace(Name, duplicated(Name) & Name != "", ""))

-output

  Name    Test
1     A   test1
2         test2
3         test3
4       test1.1
5       test2.1
6       test3.1
7     B   test1
8         test2
9         test3
10      test1.1
11      test2.1
12      test3.1