This is very similar to some other questions, but I wasn't quite satisfied with the other answers.
I have data where one column is the outcome of a Latin Square study design, where a participant had three conditions that could have come in six possible orders. I do not have a variable that indicates the order that the participant actually received the study conditions, and so need to create one myself. Here is my current and desired output using a fake example from the first three participants:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
(current <- tibble(
participant = c(1,1,1,2,2,2,3,3,3),
block_code = c("timed", "untimed", "practice", "untimed", "practice", "timed", "timed", "untimed", "practice")
))
#> # A tibble: 9 × 2
#> participant block_code
#> <dbl> <chr>
#> 1 1 timed
#> 2 1 untimed
#> 3 1 practice
#> 4 2 untimed
#> 5 2 practice
#> 6 2 timed
#> 7 3 timed
#> 8 3 untimed
#> 9 3 practice
(desired <- current %>%
mutate(order_code = c(rep("tup", 3), rep("upt", 3), rep("tup", 3))))
#> # A tibble: 9 × 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
Created on 2022-02-28 by the reprex package (v2.0.1)
Participants 1 and 3 had the same order, so they ended up with the same code.
How can I tell R to create a new column based on the order of the block_code variable within a participant?
CodePudding user response:
You can group_by(participant), then create order_code by collapsing the first letter of each block_code:
library(tidyverse)
(current %>%
group_by(participant) %>%
mutate(order_code = str_c(str_sub(block_code, end = 1), collapse = "")) %>%
ungroup())
#> # A tibble: 9 x 3
#> participant block_code order_code
#> <dbl> <chr> <chr>
#> 1 1 timed tup
#> 2 1 untimed tup
#> 3 1 practice tup
#> 4 2 untimed upt
#> 5 2 practice upt
#> 6 2 timed upt
#> 7 3 timed tup
#> 8 3 untimed tup
#> 9 3 practice tup
Created on 2022-02-28 by the reprex package (v2.0.1)
CodePudding user response:
Another slightly different option is to use summarise so that you can drop the grouping without having to ungroup. Here, we group by the participant, then collapse together only the first letter for each group.
library(tidyverse)
current %>%
group_by(participant) %>%
summarise(
block_code,
order_code = paste(substr(block_code, 0, 1), collapse = ""),
.groups = "drop"
)
Output
participant block_code order_code
<dbl> <chr> <chr>
1 1 timed tup
2 1 untimed tup
3 1 practice tup
4 2 untimed upt
5 2 practice upt
6 2 timed upt
7 3 timed tup
8 3 untimed tup
9 3 practice tup
Or with data.table:
library("data.table")
dt <- as.data.table(current)
dt[, order_code := paste(substr(block_code, 0, 1), collapse = ""), by = participant]
Or with base R:
merge(current, setNames(
aggregate(
block_code ~ participant,
data = current,
FUN = \(x) paste(substr(x, 0, 1), collapse = "")
),
c("participant", "order_code")
), by = "participant")
