repeating row until specific value is seen or reached-CodePudding

I am doing survival analysis using R and need repeating row until new value is seen. here is my data frame:

df<- data.frame(province=c(10,10,10,10,10,10,10,10,12,12,12,12,12,12,12,12), 
                 year=c(2000,2000,2001,2001,2001,2002,2002,2002,2000,2000,2000,2001,2001,2002,2002,2002), 
                 residence=c(1,1,1,1,2,1,1,2,1,2,1,1,2,1,2,1), 
                edu=c(1,2,1,2,3,1,2,3,2,1,3,2,1,2,1,3), 
                pro=c(0,0,0,0,1,0,1,0,1,0,0,0,0,1,1,0))

what I want is repeating row grouped by province , residence and edu until pro reach to 1. for some row which do not reach to 1, row repeated for all years (in my case from 2000 to 2002) . it seems I can do this by a while loop but I do not know the procedure. my expected output would be like this:

    province residence   edu   pro  year
      <dbl>     <dbl> <dbl> <dbl> <dbl>
 1       10         1     1     0  2000
 2       10         1     1     0  2001
 3       10         1     1     0  2002
 4       10         1     2     0  2000
 5       10         1     2     0  2001
 6       10         1     2     1  2002
 7       10         2     3     1  2001
 8       12         1     2     1  2000
 9       12         2     1     0  2000
10       12         2     1     0  2001
11       12         2     1     1  2002
12       12         1     3     0  2000
13       12         1     3     0  2001
14       12         1     3     0  2002

thank you in advance.

CodePudding user response：

Perhaps I'm misinterpreting. If your first frame with 16 rows is truly the original data, and you're trying to get to the second frame with 14 rows, then this method works.

df %>%
  select(-pro) %>%
  group_by(province, residence, edu) %>%
  summarize(year = setdiff(min(year):max(year), year)) %>%
  bind_rows(df) %>%
  arrange(province, residence, edu, year) %>%
  tidyr::fill(pro) %>%
  filter(!cumany(lag(pro == 1, default = FALSE))) %>%
  ungroup()
# # A tibble: 14 x 5
#    province residence   edu  year   pro
#       <dbl>     <dbl> <dbl> <dbl> <dbl>
#  1       10         1     1  2000     0
#  2       10         1     1  2001     0
#  3       10         1     1  2002     0
#  4       10         1     2  2000     0
#  5       10         1     2  2001     0
#  6       10         1     2  2002     1
#  7       10         2     3  2001     1
#  8       12         1     2  2000     1
#  9       12         1     3  2000     0
# 10       12         1     3  2001     0
# 11       12         1     3  2002     0
# 12       12         2     1  2000     0
# 13       12         2     1  2001     0
# 14       12         2     1  2002     1

Data

df <- structure(list(province = c(10, 10, 10, 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 12), year = c(2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002, 2000, 2000, 2000, 2001, 2001, 2002, 2002, 2002), residence = c(1, 1, 1, 1, 2, 1, 1, 2, 1, 2, 1, 1, 2, 1, 2, 1), edu = c(1, 2, 1, 2, 3, 1, 2, 3, 2, 1, 3, 2, 1, 2, 1, 3), pro = c(0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0)), class = "data.frame", row.names = c(NA, -16L))