Home > Software engineering >  Match datasets R
Match datasets R

Time:01-27

I have two datasets of which one is a subset of the other, which is the one with the values I am interested in (Dataset2). But the problem is, that in that dataset I can't tell which enteries belong together. In dataset1 those "groups" are defined as the enteries between the "Top" enteries.

Dataset1:       Dataset2:
1 "Top"         "Name1"
2 "Name1"       "Name2"
3 "Name2"       "Name5"
4 "Name3"       "Name6"
5 "Top"
6 "Name4"
7 "Name5"
8 "Top"
9 "Name6"
10 "Top"

As a result I need somethin like how many enteries are in which group to replicate the group names that often to give a result like that:

Result:
   Group     Name
1 "Group1"   "Name1"
2 "Group1"   "Name2"
3 "Group2"   "Name5"
4 "Group3"   "Name6"

CodePudding user response:

Given two data.frames which have a column in common:

df1 = data.frame(Name = c("Top", "Name1", "Name2", "Name3",
                   "Top", "Name4", "Name5", "Top",
                   "Name6", "Top"))
df2 = data.frame(Name = c("Name1", "Name2", "Name5", "Name6"))

We could use a right_join, and cumsum to make group indices.

library(dplyr)
df1 %>%
  mutate(Group = paste0("Group", cumsum(grepl("^T", Name)))) %>%
  right_join(df2, "Name")

   Name  Group
1 Name1 Group1
2 Name2 Group1
3 Name5 Group2
4 Name6 Group3

CodePudding user response:

A base R approach.

transform(df1, group=paste0('Group', cumsum(Name == 'Top')))[df1[1] != 'Top', ] |>
  subset(Name %in% df2$Name)
#    Name  group
# 2 Name1 Group1
# 3 Name2 Group1
# 7 Name5 Group2
# 9 Name6 Group3

Note: R>= 4.1 used.

  •  Tags:  
  • Related