I have the dataframe below and I would like to add 3 new columns. The first column will be named years_5 and it will contain values like "1951-1955","1955-1959" etc. After finding the minimum value of the column year the plus 5 years will start from this value. Here the minimum year is 1951. The second column will be named Scope_min and will contain the minimum scope value for the relative years_5 cell and the third column will be named Scope_max and will contain the maximum scope value for the relative years_5 cell. For example the first row will be like:
year Scope years_5 Scope_min Scope_max
1 1951 4 1951-1955 3 5
df<-structure(list(year = c(1951, 1954, 1955, 1957, 1958, 1960, 1961,
1962, 1963, 1964, 1965, 1967, 1968, 1969, 1970, 1971, 1972, 1973,
1974, 1975, 1976, 1977, 1979, 1980, 1981, 1982, 1983, 1984, 1985,
1986, 1988, 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998,
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,
2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019),
Scope = c(4, 3, 5, 27, 6, 13, 6, 6, 24, 1, 21, 24, 32, 57,
18, 1, 43, 39, 8, 58, 68, 78, 7, 20, 26, 4, 12, 19, 8, 37,
35, 51, 209, 478, 395, 355, 453, 457, 262, 148, 196, 180,
367, 283, 360, 414, 175, 330, 314, 446, 428, 163, 213, 165,
307, 231, 53, 189, 18, 104, 39)), class = "data.frame", row.names = c(NA,
-61L))
CodePudding user response:
Initially, I arranged the data by year. In your sample data it already is, but that may not be the case in your real data. Next, I cut the groups into five-year chunks. I left an exception for the last group (could be less than five years).
Then I created the new variables, fiveYears, Scope_min, and Scope_max. Lastly, unnesting, so that you have all the data and the group-based information.
library(tidyverse)
df1 <- df %>% arrange(year) %>%
mutate(five_years = cut(year, ceiling(nrow(df)/5),
include.lowest = T,
ordered_result = T)) %>%
group_by(five_years) %>%
nest()
df2 <- map_dfr(df1$data,
~.x %>%
mutate(fiveYears = paste0(min(year), "-", max(year)),
Scope_min = min(Scope),
Scope_max = max(Scope))
) %>% unnest(c(fiveYears, Scope_min, Scope_max))
head(df2, n = 10)
# # A tibble: 10 × 5
# year Scope fiveYears Scope_min Scope_max
# <dbl> <dbl> <chr> <dbl> <dbl>
# 1 1951 4 1951-1955 3 5
# 2 1954 3 1951-1955 3 5
# 3 1955 5 1951-1955 3 5
# 4 1957 27 1957-1961 6 27
# 5 1958 6 1957-1961 6 27
# 6 1960 13 1957-1961 6 27
# 7 1961 6 1957-1961 6 27
# 8 1962 6 1962-1965 1 24
# 9 1963 24 1962-1965 1 24
# 10 1964 1 1962-1965 1 24
