Home > OS >  Compute cummulative occurences of value (string) until a new string appears and years are continuous
Compute cummulative occurences of value (string) until a new string appears and years are continuous

Time:01-15

I have data like this:

dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )

I would like to compute the number of consecutive years that firm appears in the dataset. The expected results is like this:

dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
dataset$tenure <- c(1,2,1,1,2)

How can I get the tenure variable here? Thank you so much.

CodePudding user response:

With tidyverse you can do this. It checks if the years are one year apart and takes the cummulative sum of the logical result.

library(dplyr)
library(tidyr)

dataset %>% 
  group_by(firm) %>% 
  mutate(tenure=(year-1==lag(year))*1, 
    tenure=replace_na(tenure,1), 
    tenure=cumsum(tenure)) %>% 
  ungroup()
# A tibble: 9 × 3
   year firm  tenure
  <dbl> <chr>  <dbl>
1  2001 A          1
2  2002 A          2
3  2003 B          1
4  2005 B          1
5  2006 B          2
6  2007 B          3
7  2008 B          4
8  2010 B          4
9  2011 B          5

extended data

dataset <- structure(list(year = c(2001, 2002, 2003, 2005, 2006, 2007, 2008,
2010, 2011), firm = c("A", "A", "B", "B", "B", "B", "B", "B",
"B")), row.names = c(NA, -9L), class = "data.frame")

CodePudding user response:

You can use -

library(dplyr)

dataset %>%
  arrange(firm, year) %>%
  group_by(firm, temp = cumsum(c(TRUE, diff(year) > 1))) %>%
  mutate(tenure = row_number()) %>%
  ungroup %>%
  select(-temp)

#   year firm  tenure
#  <dbl> <chr>  <int>
#1  2001 A          1
#2  2002 A          2
#3  2003 B          1
#4  2005 B          1
#5  2006 B          2
  •  Tags:  
  • Related