I'm wrangling a dataset that has a cross-over trial design. Here is a toy example with similar structure:
df <- structure(list(subject = c("a", "a", "a", "a", "a", "a", "b",
"b", "b", "b", "c", "c", "c", "c", "c", "c"), treatment = c("none",
"placebo", "placebo", "drug", "drug", "drug", "none", "drug",
"placebo", "placebo", "none", "placebo", "drug", "drug", "drug",
"drug"), day = c(0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 0, 1, 2, 3, 4,
5)), row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"
))
# A tibble: 16 × 3
subject treatment day
<chr> <chr> <dbl>
1 a none 0
2 a placebo 1
3 a placebo 2
4 a drug 3
5 a drug 4
6 a drug 5
7 b none 0
8 b drug 1
9 b placebo 2
10 b placebo 3
11 c none 0
12 c placebo 1
13 c drug 2
14 c drug 3
15 c drug 4
16 c drug 5
So, every subject starts with a "none" value in treatment, and then has a few days of either placebo or a drug treatment, followed by a few days of the other.
What I want is a new stage column that tells me the chronological stage of the experiment based on the order of treatment value. In other words, the starting none value in a subject will always be the "first" stage of the experiment, the next chronologically appearing value in treatment within that subject will be "second" stage, and the last appearing value will be the "third" stage.
So the output I want would look like this:
# A tibble: 16 × 4
subject treatment day stage
<chr> <chr> <dbl> <chr>
1 a none 0 first
2 a placebo 1 second
3 a placebo 2 second
4 a drug 3 third
5 a drug 4 third
6 a drug 5 third
7 b none 0 first
8 b drug 1 second
9 b placebo 2 third
10 b placebo 3 third
11 c none 0 first
12 c placebo 1 second
13 c drug 2 third
14 c drug 3 third
15 c drug 4 third
16 c drug 5 third
What made sense to me was using a combination of group_by and mutate with a factor of treatment, which doesn't work
#my failed attempt
df %>%
arrange(subject, day) %>% #needed for my actual dataset
group_by(subject) %>%
mutate(stage=factor(treatment, levels=c("first", "second", "third"))) %>%
ungroup(
which gives:
# A tibble: 16 × 4
subject treatment day stage
<chr> <chr> <dbl> <fct>
1 a none 0 second
2 a placebo 1 third
3 a placebo 2 third
4 a drug 3 first
5 a drug 4 first
6 a drug 5 first
7 b none 0 second
8 b drug 1 first
9 b placebo 2 third
10 b placebo 3 third
11 c none 0 second
12 c placebo 1 third
13 c drug 2 first
14 c drug 3 first
15 c drug 4 first
16 c drug 5 first
The problem is that the labels are being shown based on the alphabetic order of the "treatment" values, but I want them to appear in the order of appearance of the treatment values within each subject. I also tried with levels instead of labels and I just get all NAs.
Any help would be much appreciated. A dplyr solution is preferred but would happily work with any other.
CodePudding user response:
You can group_by subject and then use match or rleid. Use english::ordinal to get the expected output.
df %>%
group_by(subject) %>%
mutate(match = match(treatment, unique(treatment)),
rleid = data.table::rleid(treatment),
stage = english::ordinal(match))
# A tibble: 16 × 6
# Groups: subject [3]
subject treatment day match rleid stage
<chr> <chr> <dbl> <int> <int> <ordinal>
1 a none 0 1 1 first
2 a placebo 1 2 2 second
3 a placebo 2 2 2 second
4 a drug 3 3 3 third
5 a drug 4 3 3 third
6 a drug 5 3 3 third
7 b none 0 1 1 first
8 b drug 1 2 2 second
9 b placebo 2 3 3 third
10 b placebo 3 3 3 third
11 c none 0 1 1 first
12 c placebo 1 2 2 second
13 c drug 2 3 3 third
14 c drug 3 3 3 third
15 c drug 4 3 3 third
16 c drug 5 3 3 third
CodePudding user response:
If there is any case where the placebo is given again after the drug, creating a "fourth" stage, then depending on unique(treatment) might lead to errors.
Alternatively, you could calculate the cumulative sum of the treatment change:
library(tidyverse)
df %>%
group_by(subject) %>%
mutate(stage_change = treatment!=lag(treatment),
stage = cumsum(ifelse(is.na(stage_change), 1, stage_change))) %>%
select(-stage_change)
#> # A tibble: 16 x 4
#> # Groups: subject [3]
#> subject treatment day stage
#> <chr> <chr> <dbl> <dbl>
#> 1 a none 0 1
#> 2 a placebo 1 2
#> 3 a placebo 2 2
#> 4 a drug 3 3
#> 5 a drug 4 3
#> 6 a drug 5 3
#> 7 b none 0 1
#> 8 b drug 1 2
#> 9 b placebo 2 3
#> 10 b placebo 3 3
#> 11 c none 0 1
#> 12 c placebo 1 2
#> 13 c drug 2 3
#> 14 c drug 3 3
#> 15 c drug 4 3
#> 16 c drug 5 3
Created on 2022-05-03 by the reprex package (v2.0.1)
You can then use english::ordinal(stage) if needed.
