I am sorry to come back on a topic that has several threads on Stack, but I am trying to Reshape a dataset from LONG to WIDE, using Tidyverse, and either the Gather/Spread function as well as the pivot_wider function, and I am lost. Here it is a sample of the subset I am using for test
structure(list(pid = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46",
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57",
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68",
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90",
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100",
"101", "102", "103", "104", "105", "106", "107", "108", "109",
"110", "111", "112", "113", "114", "115", "116", "117", "118",
"119", "120", "121", "122", "123", "124", "125", "126", "127",
"128", "129", "130", "131", "132", "133", "134", "135", "136",
"137", "138", "139", "140", "141", "142", "143", "144", "145",
"146", "147", "148", "149", "150", "151", "152", "153", "154",
"155", "156", "157", "158", "159", "160", "161", "162", "163",
"164", "165", "166", "167", "168", "169", "170", "171", "172",
"173", "174", "175", "176", "177", "178", "179", "180", "181",
"182", "183", "184", "185", "186", "187", "188", "189", "190",
"191", "192", "193", "194", "195", "196", "197", "198", "199",
"200", "201", "202", "203", "204", "205", "206", "207", "208",
"209", "210", "211", "212", "213", "214", "215", "216"), class = "factor"),
timewave = structure(c(1L, 2L, 3L, 4L, 5L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8"), class = "factor"), dev_icd = structure(c(1L,
1L, 1L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"),
lab_bnp = c(388, 199, 387.5, 318, 154, 949.4)), row.names = c(NA,
6L), class = "data.frame")
And here there the 2 commands I came up to
test.wide2 <- test.long2 %>%
pivot_wider(id_cols = pid,
names_from = timewave,
values_from = c(dev_icd, lab_bnp),
names_sep = "")
Or also
test.wide <- test.long2 %>%
group_by(pid) %>%
gather("dev_icd", "lab_bnp",
key = variable, value = number ) %>%
unite(combi, variable, timewave) %>%
spread(combi, number)
Neither is working as I would expect, I get a lot of NA or NULL values and do not understand what is my mistake and the correct procedure. Any help not only in fixing the issue but mainly in understanding the Reshaping logic/philosophy would be greatly appreciated
CodePudding user response:
Thanks to help from Merjin van Tiborg, I have finally nailed down the issue. The correct command to have PID on rows and columns for dev-icd and lab_bnp repeated by timewave number is as follows:
test.wide <- hf.longsmall %>%
pivot_wider(id_cols = c("pid", "timewave"),
names_from = timewave,
values_from = c(dev_icd, lab_bnp),
names_sep = "_t")
That is equivalent to the following
test.wide1 <- hf.longsmall %>%
group_by(pid, timewave) %>%
mutate(row = row_number()) %>%
tidyr::pivot_wider(names_from = timewave,
values_from = c(dev_icd, lab_bnp),
names_sep = "_t") %>%
select(-row)
I have got the following warning --> "Values in values_from are not uniquely identified; output will contain list-cols", and that was due to a real (and dangerous) mistake of duplicated PID during data entry. Anyhow, I could understand the issue only using the group_by option reported above.
Thank you all for the patience
CodePudding user response:
You need to be more explicit on what you want, we can only assume. You cannot expect any values that are not present in your widest format. I take a guess here in that you want something like this.
test.long2 %>%
pivot_wider(id_cols = c("pid", "timewave"),
names_from = pid,
values_from = c(dev_icd, lab_bnp),
names_sep = "_pid")
# A tibble: 5 x 5
timewave dev_icd_pid1 dev_icd_pid2 lab_bnp_pid1 lab_bnp_pid2
<fct> <fct> <fct> <dbl> <dbl>
1 1 No Yes 388 949.
2 2 No NA 199 NA
3 3 No NA 388. NA
4 4 No NA 318 NA
5 5 No NA 154 NA
