I have two datasets of which one is a subset of the other, which is the one with the values I am interested in (Dataset2). But the problem is, that in that dataset I can't tell which enteries belong together. In dataset1 those "groups" are defined as the enteries between the "Top" enteries.
Dataset1: Dataset2:
1 "Top" "Name1"
2 "Name1" "Name2"
3 "Name2" "Name5"
4 "Name3" "Name6"
5 "Top"
6 "Name4"
7 "Name5"
8 "Top"
9 "Name6"
10 "Top"
As a result I need somethin like how many enteries are in which group to replicate the group names that often to give a result like that:
Result:
Group Name
1 "Group1" "Name1"
2 "Group1" "Name2"
3 "Group2" "Name5"
4 "Group3" "Name6"
CodePudding user response:
Given two data.frames which have a column in common:
df1 = data.frame(Name = c("Top", "Name1", "Name2", "Name3",
"Top", "Name4", "Name5", "Top",
"Name6", "Top"))
df2 = data.frame(Name = c("Name1", "Name2", "Name5", "Name6"))
We could use a right_join, and cumsum to make group indices.
library(dplyr)
df1 %>%
mutate(Group = paste0("Group", cumsum(grepl("^T", Name)))) %>%
right_join(df2, "Name")
Name Group
1 Name1 Group1
2 Name2 Group1
3 Name5 Group2
4 Name6 Group3
CodePudding user response:
A base R approach.
transform(df1, group=paste0('Group', cumsum(Name == 'Top')))[df1[1] != 'Top', ] |>
subset(Name %in% df2$Name)
# Name group
# 2 Name1 Group1
# 3 Name2 Group1
# 7 Name5 Group2
# 9 Name6 Group3
Note: R>= 4.1 used.
