I am learning R. I know how to join dataframes based on various criteria and using various methods - all of that if they have something in common (e.g. column). But what if they do not?
I have a two example dataframes: df_data and df_categories (code provided below). I would like to modify df_data by adding new columns based on the values in df_categories. The dataframes do not share a common column.
#first dataframe (df_category):
group_vec <- c(rep("nerd", 5),
rep("bum", 5),
rep("hipster", 5),
rep("metalhead", 5),
rep("geek", 5),
rep("hooligan", 5))
sample_vec <- c("n1", "n2", "n3", "n4", "n5",
"b1", "b2", "b3", "b4", "b5",
"h1", "h2", "h3", "h4", "h5",
"m1", "m2", "m3", "m4", "m5",
"g1", "g2", "g3", "g4", "g5",
"ho1", "ho2", "ho3", "ho4", "ho5")
df_category <- data.frame(group_vec, sample_vec)
#second dataframe (df_data):
group <- c(rep("A", 2),
rep("B", 5),
rep("C", 4),
rep("D", 3))
value <- c(20, 19, 11, 8, 9, 13, 10, 7, 6, 7, 5, 17, 16, 18)
sample <- c("one1", "two1",
"one2", "two2","thr2", "fou2", "fiv2",
"one3", "two3","thr3", "fou3",
"one4", "two4","thr4")
df_data <- data.frame(group, sample, value)
In a df_data, I would like to create a column named "gr_vec", which would contain values from df_category$group_vec - for each unique value in df_data$group.
In a df_data, I would also like to create a column named "sam_vec", which would contain values from df_category$sample_vec - for each unique value in df_data$sample.
Here is the desired output:
#output dataframe:
group <- c(rep("A", 2),
rep("B", 5),
rep("C", 4),
rep("D", 3))
gr_vec <- c(rep("nerd", 2),
rep("bum", 5),
rep("hipster", 4),
rep("metalhead", 3))
sample <- c("one1", "two1",
"one2", "two2","thr2", "fou2", "fiv2",
"one3", "two3","thr3", "fou3",
"one4", "two4","thr4")
sam_vec <- c("n1", "n2",
"b1", "b2", "b3", "b4", "b5",
"h1", "h2", "h3", "h4",
"m1", "m2", "m3")
value <- c(20, 19, 11, 8, 9, 13, 10, 7, 6, 7, 5, 17, 16, 18)
df_out <- data.frame(group, gr_vec, sample, sam_vec, value)
I tried with match, rle and case_when, but was unable to assign the values correctly. I did not find a similar question, so I posted it here.
CodePudding user response:
I think you can try this approach with merging the two data.frames. Since they have no columns in common, you can create columns to merge on, based on your groups and position within each group.
First, use rleid to provide unique id's for each group in both data.frames. Then, use rowid to enumerate rows within each group. Then merge.
library(data.table)
df_data$group_num <- rleid(df_data$group)
df_data$sub_num <- rowid(df_data$group)
df_category$group_num <- rleid(df_category$group_vec)
df_category$sub_num <- rowid(df_category$group_vec)
merge(df_data, df_category, by = c("group_num", "sub_num"))
Output
group_num sub_num group sample value group_vec sample_vec
1 1 1 A one1 20 nerd n1
2 1 2 A two1 19 nerd n2
3 2 1 B one2 11 bum b1
4 2 2 B two2 8 bum b2
5 2 3 B thr2 9 bum b3
6 2 4 B fou2 13 bum b4
7 2 5 B fiv2 10 bum b5
8 3 1 C one3 7 hipster h1
9 3 2 C two3 6 hipster h2
10 3 3 C thr3 7 hipster h3
11 3 4 C fou3 5 hipster h4
12 4 1 D one4 17 metalhead m1
13 4 2 D two4 16 metalhead m2
14 4 3 D thr4 18 metalhead m3
