My initial dataset
df1 <- structure(list(id = c(1, 1, 2, 3, 3, 3),
name = c("james", "james", "peter", "anne", "anne", "anne"),
trip_id = c(10,11,10,30,11,32),
date = c("2021/01/01", "2021/06/01","2021/08/01","2021/10/01","2021/10/21","2021/12/01"),
cost = c(100,150,3000,1200,1100,5000)
),
row.names = c(NA,-6L),
class = c("tbl_df", "tbl", "data.frame"))
I require to pivot wider the date and the cost of each trip so they are both together in pairs. I think im closer but would appreciate your feedback.
My current code
df2= df1 %>% pivot_wider(names_from = trip_id,
values_from = c(date, cost))
My desired result
df2 <- structure(list(id = c(1, 2, 3),
name = c("james", "peter", "anne"),
date_10 = c("2021/01/01","2021/08/01",NA),
cost_10 = c(100,3000,NA),
date_11 = c("2021/06/01",NA,"2021/10/21"),
cost_11 = c(150,NA,1100),
date_30 = c(NA,NA,"2021/10/01"),
cost_30 = c(NA,NA,1200),
date_32 = c(NA,NA,"2021/12/01"),
cost_32 = c(NA,NA,5000)
),
row.names = c(NA,-3L),
class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
It looks like you were quite close. We take the trip_id before pivot_wider to help reorder the columns. You may or may not need the sort depending on your desired result. If you just want pairs, there's no need to sort.
library(tidyverse)
nums <- sort(unique(df1$trip_id))
nums <- as.character(nums)
df2 <-
df1 %>%
pivot_wider(names_from = trip_id,
values_from = c(date, cost)) %>%
select(id, name, ends_with(nums))
df2
#> # A tibble: 3 x 10
#> id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
#> <dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 james 2021/01/01 100 2021/0~ 150 <NA> NA <NA> NA
#> 2 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
#> 3 3 anne <NA> NA 2021/1~ 1100 2021/1~ 1200 2021/1~ 5000
CodePudding user response:
For this purpose you do need an additional pivot_longer step before changing your data shape into wide format. Note that I used values_transform argument in pivot_longer to change the class of cost to character so that I could combine it with date in the intermediate val variable:
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(c(date, cost), names_to = "var",
values_to = "val",
values_transform = list(val = as.character)) %>%
pivot_wider(names_from = c(var, trip_id), values_from = val) %>%
mutate(across(starts_with("cost"), as.double))
# A tibble: 3 x 10
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
<dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 1 james 2021/01/01 100 2021/06/01 150 NA NA NA NA
2 2 peter 2021/08/01 3000 NA NA NA NA NA NA
3 3 anne NA NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000
CodePudding user response:
The way you have it is correct. Column numbering/Row numbering should not impact the data unless you have time series/panel data and some few exceptions.
Otherwise, you could accomplish the same using the reshape function which will give you what you want. Mark you this is Base R:
reshape(data.frame(df1), timevar = 'trip_id', idvar = c('id', 'name'), dir='wide', sep = '_')
id name date_10 cost_10 date_11 cost_11 date_30 cost_30 date_32 cost_32
1 1 james 2021/01/01 100 2021/06/01 150 <NA> NA <NA> NA
3 2 peter 2021/08/01 3000 <NA> NA <NA> NA <NA> NA
4 3 anne <NA> NA 2021/10/21 1100 2021/10/01 1200 2021/12/01 5000
