I'm working with two datasets - one set has pairs of items:
original <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"))
original
label1 label2
1 cat dog
2 cat dog
3 dog cat
4 dog cat
5 cat dog
6 tiger cow
7 tiger cow
8 cow tiger
the second dataset contains index codes for items from the first set:
index <- data.frame(item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0))
index
item code
1 cat 1
2 dog 0
3 tiger 1
4 cow 0
I'm looking for a way to create two new columns: tag0 and tag1 so that it looks like this:
new <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"),
tag1 = c("cat", "cat", "cat", "cat", "cat", "tiger", "tiger", "tiger"),
tag0 = c("dog", "dog", "dog", "dog", "dog", "cow", "cow", "cow"))
new
label1 label2 tag1 tag0
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
tag0 refers to the label corresponding to code=0 and tag1 refers to the label corresponding to code=1 in the index dataframe.
Could anyone help me out with a tidyverse based solution?
CodePudding user response:
Here's a solution in the tidyverse.
Solution
First import the tidyverse and generate your datasets original and index.
library(tidyverse)
# ...
# Code to generate 'original' and 'index' datasets.
# ...
Then apply this workflow.
original %>%
# Uniquely identify each row (for pivoting later).
mutate(row_id = row_number()) %>%
# Match 'label1' to the tags.
left_join(
index,
by = c("label1" = "item"),
keep = TRUE
) %>%
# Match 'label2' to the tags.
left_join(
index,
by = c("label2" = "item"),
keep = TRUE,
suffix = c(".1", ".2")
) %>%
# Pivot 'item.1 | ... | item.n | code.1 | ... | code.n' into a consolidated
# 'item | code' form.
pivot_longer(
cols = matches("^(item|code)\\.(\\d )?$"),
names_pattern = "^(item|code)\\.(\\d )?$",
names_to = c(".value", NA)
) %>%
# Pivot back into a 'tag1 | tag0' form.
pivot_wider(
values_from = item,
names_from = code,
names_glue = "tag{code}"
) %>%
# Omit unique identifier.
select(!row_id)
Result
Given the original and index datasets like those reproduced here
original <- data.frame(
label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger")
)
index <- data.frame(
item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0)
)
this solution should yield the following result:
# A tibble: 8 x 4
label1 label2 tag1 tag0
<chr> <chr> <chr> <chr>
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
Update
Here's a more elegant workflow, which is also more flexible: it works for an arbitrary number of label* columns in original and an arbitrary set of codes in index.
original %>%
# Uniquely identify each row (for pivoting later).
mutate(row_id = row_number()) %>%
# Perform a cross-join compare every 'item' to every 'label*'.
full_join(
index,
by = character()
) %>%
# Keep only those rows where 'item' matches a 'label*'.
rowwise() %>%
filter(item %in% c_across(matches("^label\\d "))) %>%
# Pivot into a 'tag1 | tag0' form.
pivot_wider(
values_from = item,
names_from = code,
names_glue = "tag{code}"
) %>%
# Omit unique identifier.
select(!row_id)
Results
The results remain identical.
# A tibble: 8 x 4
label1 label2 tag1 tag0
<chr> <chr> <chr> <chr>
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
Note
The only drawback is that it must perform a CROSS JOIN, which could hinder performance on larger datasets.
CodePudding user response:
Another possible solution:
library(tidyverse)
original <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"))
index <- data.frame(item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0))
original %>%
full_join(index, by=c("label1" = "item")) %>%
full_join(index, by=c("label2" = "item")) %>%
mutate(tag1 = if_else(code.x == 1, label1, label2)) %>%
mutate(tag2 = if_else(code.y == 1, label1, label2)) %>%
select(!starts_with("code"))
#> label1 label2 tag1 tag2
#> 1 cat dog cat dog
#> 2 cat dog cat dog
#> 3 dog cat cat dog
#> 4 dog cat cat dog
#> 5 cat dog cat dog
#> 6 tiger cow tiger cow
#> 7 tiger cow tiger cow
#> 8 cow tiger tiger cow
