Find the previous-trial score within a nested dataframe in a loop and save as a new variable-CodePudding

I have a dataframe that currently looks like this:

subjectID	Trial	Score
1	1	16
1	1	16
1	1	16
1	2	8
1	2	8
1	2	8
1	3	12
1	3	12
1	3	12
2	1	9
2	1	9
2	1	9
2	2	10
2	2	10
2	2	10

I need to create a new column, Previous_Trial_Score, that is simply the score on the last trial for each person. For example:

subjectID	Trial	Score	Previous_Trial_Score
1	1	16	NA
1	1	16	NA
1	1	16	NA
1	2	8	16
1	2	8	16
1	2	8	16
1	3	12	8
1	3	12	8
1	3	12	8
2	1	9	NA
2	1	9	NA
2	1	9	NA
2	2	10	9
2	2	10	9
2	2	10	9

And so on. Trial 1 for each subject will always be NA, as there is no previous trial for that person. I am writing a for-loop to accomplish this, below:

for (myperson in unique(data$subjectID)){
  for (mytrial in unique(data$Trial[data$Trial>1])){

    #Specify the trial and person
    Prev_Score=as.numeric(unique(data[data$subjectID==myperson & data$Trial==mytrial-1, "Score"]))
    

    #Save it to the dataframe
    data[data$subjectID==myperson & data$Trial==mytrial,"Prev_Score"]=Prev_Score
    
    
  }
}

In the above loop, I had to specify as.numeric and unique to get R to return a single value properly. However, when I run the loop, I get this error:

Error: Assigned data `value` must be compatible with existing data.
i Error occurred for column `Prev_Score`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 3...

Does anyone have a solution? I am open to tidyverse/dplyr work-arounds.

CodePudding user response：

We could do a group by summarise to get the lag and then do a join

library(dplyr)
df1 %>% 
   group_by(subjectID, Trial) %>%
   summarise(Previous_Trial_Score = last(Score), .groups= 'drop_last') %>% 
   mutate(Previous_Trial_Score = lag(Previous_Trial_Score)) %>%   
   left_join(df1, .)

-output

   subjectID Trial Score Previous_Trial_Score
1          1     1    16                   NA
2          1     1    16                   NA
3          1     1    16                   NA
4          1     2     8                   16
5          1     2     8                   16
6          1     2     8                   16
7          1     3    12                    8
8          1     3    12                    8
9          1     3    12                    8
10         2     1     9                   NA
11         2     1     9                   NA
12         2     1     9                   NA
13         2     2    10                    9
14         2     2    10                    9
15         2     2    10                    9

Or slightly compact option with data.table

library(data.table)
setDT(df1)[, Previous_Trial_Score := shift(.SD[, last(Score), 
       Trial]$V1)[Trial], subjectID]

-output

> df1
    subjectID Trial Score Previous_Trial_Score
        <int> <int> <int>                <int>
 1:         1     1    16                   NA
 2:         1     1    16                   NA
 3:         1     1    16                   NA
 4:         1     2     8                   16
 5:         1     2     8                   16
 6:         1     2     8                   16
 7:         1     3    12                    8
 8:         1     3    12                    8
 9:         1     3    12                    8
10:         2     1     9                   NA
11:         2     1     9                   NA
12:         2     1     9                   NA
13:         2     2    10                    9
14:         2     2    10                    9
15:         2     2    10                    9

data

df1 <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L), Score = c(16L, 16L, 16L, 
8L, 8L, 8L, 12L, 12L, 12L, 9L, 9L, 9L, 10L, 10L, 10L)), 
class = "data.frame", row.names = c(NA, 
-15L))

CodePudding user response：

Here is another dplyr option, where we just keep the unique rows with distinct, then get the lag value of Score, and join back to the original dataframe. However, if you have more columns in your dataframe, then @akrun provides a cleaner approach of handling the additional columns.

library(dplyr)

df %>%
  distinct() %>%
  group_by(subjectID) %>%
  mutate(Previous_Trial_Score = lag(Score)) %>% 
  left_join(df, .)

Output

   subjectID Trial Score Previous_Trial_Score
1          1     1    16                   NA
2          1     1    16                   NA
3          1     1    16                   NA
4          1     2     8                   16
5          1     2     8                   16
6          1     2     8                   16
7          1     3    12                    8
8          1     3    12                    8
9          1     3    12                    8
10         2     1     9                   NA
11         2     1     9                   NA
12         2     1     9                   NA
13         2     2    10                    9
14         2     2    10                    9
15         2     2    10                    9

Data

df <- structure(list(subjectID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L), Trial = c(1L, 1L, 1L, 2L, 2L, 2L, 
3L, 3L, 3L, 1L, 1L, 1L, 2L, 2L, 2L), Score = c(16L, 16L, 16L, 
8L, 8L, 8L, 12L, 12L, 12L, 9L, 9L, 9L, 10L, 10L, 10L)), class = "data.frame", row.names = c(NA, 
-15L))

subjectID	Trial	Score	Previous_Trial_Score
1	1	16	NA
1	1	16	NA
1	1	16	NA
1	2	8	16
1	2	8	16
1	2	8	16
1	3	12	8
1	3	12	8
1	3	12	8
2	1	9	NA
2	1	9	NA
2	1	9	NA
2	2	10	9
2	2	10	9
2	2	10	9

subjectID	Trial	Score	Previous_Trial_Score
1	1	16	NA
1	1	16	NA
1	1	16	NA
1	2	8	16
1	2	8	16
1	2	8	16
1	3	12	8
1	3	12	8
1	3	12	8
2	1	9	NA
2	1	9	NA
2	1	9	NA
2	2	10	9
2	2	10	9
2	2	10	9

subjectID	Trial	Score	Previous_Trial_Score
1	1	16	NA
1	1	16	NA
1	1	16	NA
1	2	8	16
1	2	8	16
1	2	8	16
1	3	12	8
1	3	12	8
1	3	12	8
2	1	9	NA
2	1	9	NA
2	1	9	NA
2	2	10	9
2	2	10	9
2	2	10	9