Home > Blockchain >  Find the minimum and maximum values of a numeric column based on the range of numbers in other numer
Find the minimum and maximum values of a numeric column based on the range of numbers in other numer

Time:02-03

I have the dataframe below and I would like to add 3 new columns. The first column will be named years_5 and it will contain values like "1951-1955","1955-1959" etc. After finding the minimum value of the column year the plus 5 years will start from this value. Here the minimum year is 1951. The second column will be named Scope_min and will contain the minimum scope value for the relative years_5 cell and the third column will be named Scope_max and will contain the maximum scope value for the relative years_5 cell. For example the first row will be like:

year Scope   years_5 Scope_min Scope_max
1 1951     4 1951-1955         3         5

df<-structure(list(year = c(1951, 1954, 1955, 1957, 1958, 1960, 1961, 
1962, 1963, 1964, 1965, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 
1974, 1975, 1976, 1977, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 
1986, 1988, 1989, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 
2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019), 
    Scope = c(4, 3, 5, 27, 6, 13, 6, 6, 24, 1, 21, 24, 32, 57, 
    18, 1, 43, 39, 8, 58, 68, 78, 7, 20, 26, 4, 12, 19, 8, 37, 
    35, 51, 209, 478, 395, 355, 453, 457, 262, 148, 196, 180, 
    367, 283, 360, 414, 175, 330, 314, 446, 428, 163, 213, 165, 
    307, 231, 53, 189, 18, 104, 39)), class = "data.frame", row.names = c(NA, 
-61L))

CodePudding user response:

Initially, I arranged the data by year. In your sample data it already is, but that may not be the case in your real data. Next, I cut the groups into five-year chunks. I left an exception for the last group (could be less than five years).

Then I created the new variables, fiveYears, Scope_min, and Scope_max. Lastly, unnesting, so that you have all the data and the group-based information.

library(tidyverse)

df1 <- df %>% arrange(year) %>% 
  mutate(five_years = cut(year, ceiling(nrow(df)/5),
                          include.lowest = T,
                          ordered_result = T)) %>% 
  group_by(five_years) %>% 
  nest() 

df2 <- map_dfr(df1$data,
               ~.x %>% 
                 mutate(fiveYears = paste0(min(year), "-", max(year)),
                        Scope_min = min(Scope),
                        Scope_max = max(Scope))
               ) %>% unnest(c(fiveYears, Scope_min, Scope_max))

head(df2, n = 10)
# # A tibble: 10 × 5
#     year Scope fiveYears Scope_min Scope_max
#    <dbl> <dbl> <chr>         <dbl>     <dbl>
#  1  1951     4 1951-1955         3         5
#  2  1954     3 1951-1955         3         5
#  3  1955     5 1951-1955         3         5
#  4  1957    27 1957-1961         6        27
#  5  1958     6 1957-1961         6        27
#  6  1960    13 1957-1961         6        27
#  7  1961     6 1957-1961         6        27
#  8  1962     6 1962-1965         1        24
#  9  1963    24 1962-1965         1        24
# 10  1964     1 1962-1965         1        24 
  •  Tags:  
  • Related