Create new column based on existing columns whose names are stored in another column (dplyr)-CodePudding

Consider the following dataset:

df <- tibble(v1 = 1:5, v2= 101:105, v3 = c("v1", "v2", "v1", "v2", "v1"))

# A tibble: 5 × 3
     v1    v2 v3   
  <int> <int> <chr>
1     1   101 v1   
2     2   102 v2   
3     3   103 v1   
4     4   104 v2   
5     5   105 v1

I would like to generate a new column that takes values from either v1 or v2, depending on which column is listed in v3.

    # A tibble: 5 × 4
     v1    v2 v3       v4
  <int> <int> <chr> <dbl>
1     1   101 v1        1
2     2   102 v2      102
3     3   103 v1        3
4     4   104 v2      104
5     5   105 v1        5

Normally, I would use if_else, or if I had more cases, case_when. However, I have a lot of columns, so I'd rather not have a case_when statement that's many lines long. Is there a way to get R to interpret the values in v3 as column names? I've tried embracing the expression with {{ }} and using the .data[[ ]], but I can't seem to figure out the correct syntax.

CodePudding user response：

Here's a vectorized approach, no need to go row-wise or map it one-by-one.

df %>%
  mutate(v4 = cbind(v1,v2)[ cbind(row_number(), match(v3, c("v1", "v2"))) ])
# # A tibble: 5 x 4
#      v1    v2 v3       v4
#   <int> <int> <chr> <int>
# 1     1   101 v1        1
# 2     2   102 v2      102
# 3     3   103 v1        3
# 4     4   104 v2      104
# 5     5   105 v1        5

CodePudding user response：

A tidyverse option would be rowwise with extraction using cur_data()

library(dplyr)
df %>% 
  rowwise %>%
  mutate(v4 = cur_data()[[v3]]) %>% 
  ungroup
# A tibble: 5 × 4
     v1    v2 v3       v4
  <int> <int> <chr> <int>
1     1   101 v1        1
2     2   102 v2      102
3     3   103 v1        3
4     4   104 v2      104
5     5   105 v1        5

Or a compact approach would be get after rowwise

df %>%
  rowwise %>%
  mutate(v4 = get(v3)) %>%
  ungroup

Or in base R, use row/column indexing for faster execution

df$v4 <- as.data.frame(df[1:2])[cbind(seq_len(nrow(df)), 
      match(df$v3, names(df)))]
df$v4
[1]   1 102   3 104   5

CodePudding user response：

You can try the following base R code with diag as.matrix (or t)

transform(
  df,
  v4 = diag(as.matrix(df[v3]))
)

transform(
  df,
  v4 = diag(t(df[v3]))
)

which gives

  v1  v2 v3  v4
1  1 101 v1   1
2  2 102 v2 102
3  3 103 v1   3
4  4 104 v2 104
5  5 105 v1   5

CodePudding user response：

Here is a way how we could do it with pivot_longer:

bring into long format with pivot_longer
filter
bind_cols() v1 and v2

library(tidyr)
library(dplyr)
df %>% 
  pivot_longer(
    -v3,
    names_to = "name",
    values_to = "v4"
  ) %>% 
  filter(v3 == name) %>% 
  bind_cols(v1 = df$v1, v2=df$v2) %>% 
  select(v1, v2, v3, v4)

  v1    v2 v3       v4
  <int> <int> <chr> <int>
1     1   101 v1        1
2     2   102 v2      102
3     3   103 v1        3
4     4   104 v2      104
5     5   105 v1        5

CodePudding user response：

A base R solution:

df <- tibble(v1 = 1:5, v2= 101:105, v3 = c("v1", "v2", "v1", "v2", "v1"))

df$v4 <- apply(df, 1, function(x) x[x[3]])
df         

#> # A tibble: 5 × 4
#>      v1    v2 v3    v4   
#>   <int> <int> <chr> <chr>
#> 1     1   101 v1    1    
#> 2     2   102 v2    102  
#> 3     3   103 v1    3    
#> 4     4   104 v2    104  
#> 5     5   105 v1    5

Another possible solution, using purrr::pmap_dfr:

library(tidyverse)    

df <- tibble(v1 = 1:5, v2= 101:105, v3 = c("v1", "v2", "v1", "v2", "v1"))

df %>% 
  mutate(pmap_dfr(., ~ list(v4 = if_else(..3 == "v1", ..1, ..2))))

#> # A tibble: 5 × 4
#>      v1    v2 v3       v4
#>   <int> <int> <chr> <int>
#> 1     1   101 v1        1
#> 2     2   102 v2      102
#> 3     3   103 v1        3
#> 4     4   104 v2      104
#> 5     5   105 v1        5