How to pivot wider multiple columns on dataset and maintain a specific colum order?-CodePudding

My initial dataset

df1 <- structure(list(id = c(1, 1, 2, 3, 3, 3), 
                      name = c("james", "james", "peter", "anne", "anne", "anne"), 
                      trip_id = c(10,11,10,30,11,32),
                      date = c("2021/01/01", "2021/06/01","2021/08/01","2021/10/01","2021/10/21","2021/12/01"),
                      cost = c(100,150,3000,1200,1100,5000)
                      
), 
row.names = c(NA,-6L), 
class = c("tbl_df", "tbl", "data.frame"))

I require to pivot wider the date and the cost of each trip so they are both together in pairs. I think im closer but would appreciate your feedback.

My current code

df2= df1 %>% pivot_wider(names_from = trip_id, 
                               values_from = c(date, cost))

My desired result

df2 <- structure(list(id = c(1, 2, 3), 
                      name = c("james", "peter", "anne"), 
                      date_10 = c("2021/01/01","2021/08/01",NA),
                      cost_10 = c(100,3000,NA),
                      date_11 = c("2021/06/01",NA,"2021/10/21"),
                      cost_11 = c(150,NA,1100),
                      date_30 = c(NA,NA,"2021/10/01"),
                      cost_30 = c(NA,NA,1200),
                      date_32 = c(NA,NA,"2021/12/01"),
                      cost_32 = c(NA,NA,5000)             
                      
), 
row.names = c(NA,-3L), 
class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response：

It looks like you were quite close. We take the trip_id before pivot_wider to help reorder the columns. You may or may not need the sort depending on your desired result. If you just want pairs, there's no need to sort.

library(tidyverse)
nums <- sort(unique(df1$trip_id))
nums <- as.character(nums)

df2 <- 
  df1 %>% 
    pivot_wider(names_from = trip_id, 
                values_from = c(date, cost)) %>%
    select(id, name, ends_with(nums))

df2
#> # A tibble: 3 x 10
#>      id name  date_10    cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
#>   <dbl> <chr> <chr>        <dbl> <chr>     <dbl> <chr>     <dbl> <chr>     <dbl>
#> 1     1 james 2021/01/01     100 2021/0~     150 <NA>         NA <NA>         NA
#> 2     2 peter 2021/08/01    3000 <NA>         NA <NA>         NA <NA>         NA
#> 3     3 anne  <NA>            NA 2021/1~    1100 2021/1~    1200 2021/1~    5000

CodePudding user response：

For this purpose you do need an additional pivot_longer step before changing your data shape into wide format. Note that I used values_transform argument in pivot_longer to change the class of cost to character so that I could combine it with date in the intermediate val variable:

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(c(date, cost), names_to = "var", 
               values_to = "val", 
               values_transform = list(val = as.character)) %>%
  pivot_wider(names_from = c(var, trip_id), values_from = val) %>%
  mutate(across(starts_with("cost"), as.double))


# A tibble: 3 x 10
     id name  date_10    cost_10 date_11    cost_11 date_30    cost_30 date_32    cost_32
  <dbl> <chr> <chr>        <dbl> <chr>        <dbl> <chr>        <dbl> <chr>        <dbl>
1     1 james 2021/01/01     100 2021/06/01     150 NA              NA NA              NA
2     2 peter 2021/08/01    3000 NA              NA NA              NA NA              NA
3     3 anne  NA              NA 2021/10/21    1100 2021/10/01    1200 2021/12/01    5000

CodePudding user response：

The way you have it is correct. Column numbering/Row numbering should not impact the data unless you have time series/panel data and some few exceptions.

Otherwise, you could accomplish the same using the reshape function which will give you what you want. Mark you this is Base R:

reshape(data.frame(df1), timevar = 'trip_id', idvar = c('id', 'name'), dir='wide', sep = '_')

  id  name    date_10 cost_10    date_11 cost_11    date_30 cost_30    date_32 cost_32
1  1 james 2021/01/01     100 2021/06/01     150       <NA>      NA       <NA>      NA
3  2 peter 2021/08/01    3000       <NA>      NA       <NA>      NA       <NA>      NA
4  3  anne       <NA>      NA 2021/10/21    1100 2021/10/01    1200 2021/12/01    5000