Create a column that increments for each unique ID in R-CodePudding

I have a dataset that looks something like this:

ID     Minutes Read    Comprehension     
1      25              1      
1      30              1      
2      20              2      
2      25              2     
2      30              1

I want to create a column called "day" that counts the days each person reported reading, such as below:

ID     Minutes Read    Comprehension     Day  
1      25              1                 1
1      30              1                 2
2      20              2                 1
2      25              2                 2
2      30              1                 3

How would I go about doing that? The end goal is to use the "day" column to reshape my data,

df2 <- reshape(df, idvar="ID", timevar = "day", direction="wide").

CodePudding user response：

Since your aim is to reshape the data, try:

reshape(transform(df, time = ave(ID, ID, FUN = seq)), dir = 'wide', idvar = 'ID')

  ID Minutes.Read.1 Comprehension.1 Minutes.Read.2 Comprehension.2 Minutes.Read.3 Comprehension.3
1  1             25               1             30               1             NA              NA
3  2             20               2             25               2             30               1

If you are only interested in the day column, then

df <- transform(df, day = ave(ID, ID, FUN = seq))

CodePudding user response：

Here is a tidyverse solution:

library(dplyr)
library(tidyr)

df %>% 
  group_by(ID) %>% 
  mutate(day = row_number()) %>% 
  pivot_wider(
    names_from = day,
    values_from = c(MinutesRead, Comprehension)
  )

     ID MinutesRead_1 MinutesRead_2 MinutesRead_3 Comprehension_1 Comprehension_2 Comprehension_3
  <int>         <int>         <int>         <int>           <int>           <int>           <int>
1     1            25            30            NA               1               1              NA
2     2            20            25            30               2               2               1

df <- structure(list(ID = c(1L, 1L, 2L, 2L, 2L), MinutesRead = c(25L, 
30L, 20L, 25L, 30L), Comprehension = c(1L, 1L, 2L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
-5L))

CodePudding user response：

You could do this as a one-liner:

df$Day <- unlist(sapply(rle(df$ID)$lengths, seq_len))

df
  ID Minutes.Read Comprehension Day
1  1           25             1   1
2  1           30             1   2
3  2           20             2   1
4  2           25             2   2
5  2           30             1   3

CodePudding user response：

Using lapply

We first call to split() to split df by ID. The output of split is a list, so we use lapply to perform the task on each element (i.e. ID) of that list. Then, we compute the number of rows per ID and use seq_len to create a sequence of numbers from 1 to the number of rows per ID. Finally, we rbind, so that we are returned a data.frame.

df <- do.call(
  rbind,
  lapply(split(df, df$ID), function(x) cbind(x, Day = seq_len(nrow(x))))
)
rownames(df) <- NULL # optional

#> str(df$Day)
# int [1:5] 1 2 1 2 3

Data

df <- data.frame(ID = c(1,1,2,2,2),
                 Minutes = c(25,30,20,25,30),
                 Comprehension = c(1,1,2,2,1))