Home > Mobile >  Create new column based on existing columns whose names are stored in another column (dplyr)
Create new column based on existing columns whose names are stored in another column (dplyr)

Time:01-22

Consider the following dataset:

df <- tibble(v1 = 1:5, v2= 101:105, v3 = c("v1", "v2", "v1", "v2", "v1"))

# A tibble: 5 × 3
     v1    v2 v3   
  <int> <int> <chr>
1     1   101 v1   
2     2   102 v2   
3     3   103 v1   
4     4   104 v2   
5     5   105 v1  

I would like to generate a new column that takes values from either v1 or v2, depending on which column is listed in v3.

    # A tibble: 5 × 4
     v1    v2 v3       v4
  <int> <int> <chr> <dbl>
1     1   101 v1        1
2     2   102 v2      102
3     3   103 v1        3
4     4   104 v2      104
5     5   105 v1        5

Normally, I would use if_else, or if I had more cases, case_when. However, I have a lot of columns, so I'd rather not have a case_when statement that's many lines long. Is there a way to get R to interpret the values in v3 as column names? I've tried embracing the expression with {{ }} and using the .data[[ ]], but I can't seem to figure out the correct syntax.

CodePudding user response:

Here's a vectorized approach, no need to go row-wise or map it one-by-one.

df %>%
  mutate(v4 = cbind(v1,v2)[ cbind(row_number(), match(v3, c("v1", "v2"))) ])
# # A tibble: 5 x 4
#      v1    v2 v3       v4
#   <int> <int> <chr> <int>
# 1     1   101 v1        1
# 2     2   102 v2      102
# 3     3   103 v1        3
# 4     4   104 v2      104
# 5     5   105 v1        5

CodePudding user response:

A tidyverse option would be rowwise with extraction using cur_data()

library(dplyr)
df %>% 
  rowwise %>%
  mutate(v4 = cur_data()[[v3]]) %>% 
  ungroup
# A tibble: 5 × 4
     v1    v2 v3       v4
  <int> <int> <chr> <int>
1     1   101 v1        1
2     2   102 v2      102
3     3   103 v1        3
4     4   104 v2      104
5     5   105 v1        5

Or a compact approach would be get after rowwise

df %>%
  rowwise %>%
  mutate(v4 = get(v3)) %>%
  ungroup

Or in base R, use row/column indexing for faster execution

df$v4 <- as.data.frame(df[1:2])[cbind(seq_len(nrow(df)), 
      match(df$v3, names(df)))]
df$v4
[1]   1 102   3 104   5

CodePudding user response:

You can try the following base R code with diag as.matrix (or t)

transform(
  df,
  v4 = diag(as.matrix(df[v3]))
)

or

transform(
  df,
  v4 = diag(t(df[v3]))
)

which gives

  v1  v2 v3  v4
1  1 101 v1   1
2  2 102 v2 102
3  3 103 v1   3
4  4 104 v2 104
5  5 105 v1   5

CodePudding user response:

Here is a way how we could do it with pivot_longer:

  1. bring into long format with pivot_longer
  2. filter
  3. bind_cols() v1 and v2
library(tidyr)
library(dplyr)
df %>% 
  pivot_longer(
    -v3,
    names_to = "name",
    values_to = "v4"
  ) %>% 
  filter(v3 == name) %>% 
  bind_cols(v1 = df$v1, v2=df$v2) %>% 
  select(v1, v2, v3, v4)
  v1    v2 v3       v4
  <int> <int> <chr> <int>
1     1   101 v1        1
2     2   102 v2      102
3     3   103 v1        3
4     4   104 v2      104
5     5   105 v1        5

CodePudding user response:

A base R solution:

df <- tibble(v1 = 1:5, v2= 101:105, v3 = c("v1", "v2", "v1", "v2", "v1"))

df$v4 <- apply(df, 1, function(x) x[x[3]])
df         

#> # A tibble: 5 × 4
#>      v1    v2 v3    v4   
#>   <int> <int> <chr> <chr>
#> 1     1   101 v1    1    
#> 2     2   102 v2    102  
#> 3     3   103 v1    3    
#> 4     4   104 v2    104  
#> 5     5   105 v1    5

Another possible solution, using purrr::pmap_dfr:

library(tidyverse)    

df <- tibble(v1 = 1:5, v2= 101:105, v3 = c("v1", "v2", "v1", "v2", "v1"))

df %>% 
  mutate(pmap_dfr(., ~ list(v4 = if_else(..3 == "v1", ..1, ..2))))

#> # A tibble: 5 × 4
#>      v1    v2 v3       v4
#>   <int> <int> <chr> <int>
#> 1     1   101 v1        1
#> 2     2   102 v2      102
#> 3     3   103 v1        3
#> 4     4   104 v2      104
#> 5     5   105 v1        5
  •  Tags:  
  • Related