I have data like this:
dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
I would like to compute the number of consecutive years that firm appears in the dataset. The expected results is like this:
dataset <- data.frame(year = c(2001, 2002, 2003, 2005, 2006))
dataset$firm <- c("A", "A", "B","B","B" )
dataset$tenure <- c(1,2,1,1,2)
How can I get the tenure variable here? Thank you so much.
CodePudding user response:
With tidyverse you can do this. It checks if the years are one year apart and takes the cummulative sum of the logical result.
library(dplyr)
library(tidyr)
dataset %>%
group_by(firm) %>%
mutate(tenure=(year-1==lag(year))*1,
tenure=replace_na(tenure,1),
tenure=cumsum(tenure)) %>%
ungroup()
# A tibble: 9 × 3
year firm tenure
<dbl> <chr> <dbl>
1 2001 A 1
2 2002 A 2
3 2003 B 1
4 2005 B 1
5 2006 B 2
6 2007 B 3
7 2008 B 4
8 2010 B 4
9 2011 B 5
extended data
dataset <- structure(list(year = c(2001, 2002, 2003, 2005, 2006, 2007, 2008,
2010, 2011), firm = c("A", "A", "B", "B", "B", "B", "B", "B",
"B")), row.names = c(NA, -9L), class = "data.frame")
CodePudding user response:
You can use -
library(dplyr)
dataset %>%
arrange(firm, year) %>%
group_by(firm, temp = cumsum(c(TRUE, diff(year) > 1))) %>%
mutate(tenure = row_number()) %>%
ungroup %>%
select(-temp)
# year firm tenure
# <dbl> <chr> <int>
#1 2001 A 1
#2 2002 A 2
#3 2003 B 1
#4 2005 B 1
#5 2006 B 2
