Home > Enterprise >  Adding a new year variable to a data frame (with all other variables being duplicated)
Adding a new year variable to a data frame (with all other variables being duplicated)

Time:01-10

I have a data frame containing a shape-file that I want to merge with another data-set that contains years. I'm interested in adding a variable with years to the former while all other variables remain the same for each year. I'm not sure how to do this.

As an example, say I have the following data-set:

a <- data.frame(code = c("aaa" , "bbb", "ccc") ,
            item = c("apples" , "bananas" , "carrots") ,
            id = c(1,2,3))

giving the following:

  code    item id
1  aaa  apples  1
2  bbb bananas  2
3  ccc carrots  3

I would like to add a new variable called year of length n that repeats all the same elements of the other variables for each year. For example, say I'd like to add the years 1990 to 1992 to an existing object like this:

  code    item id year
1  aaa  apples  1 1990
2  aaa  apples  1 1991
3  aaa  apples  1 1992
4  bbb bananas  2 1990
5  bbb bananas  2 1991
6  bbb bananas  2 1992
7  ccc carrots  3 1990
8  ccc carrots  3 1991
9  ccc carrots  3 1992

Is there a way of doing this (for existing data frames)? For this example I used this code;

b <- data.frame(code = rep(c("aaa" , "bbb", "ccc") , each = 3) ,
                item = rep(c("apples" , "bananas" , "carrots") , each = 3) ,
                id = rep(c(1,2,3) , each = 3) ,
                year = rep(c(1990:1992) , times = 3))

but this would not work (or is extremely inefficient) when the data-set is already there or extremely large. Is there a better way of doing this?

CodePudding user response:

We can create a list column grouped by 'code' or 'id' and then unnest the list

library(dplyr)
library(tidyr)
a %>% 
  group_by(id) %>% 
  mutate(year = list(1990:1992)) %>% 
  ungroup %>%
  unnest(year)

-output

# A tibble: 9 × 4
  code  item       id  year
  <chr> <chr>   <dbl> <int>
1 aaa   apples      1  1990
2 aaa   apples      1  1991
3 aaa   apples      1  1992
4 bbb   bananas     2  1990
5 bbb   bananas     2  1991
6 bbb   bananas     2  1992
7 ccc   carrots     3  1990
8 ccc   carrots     3  1991
9 ccc   carrots     3  1992

Or use crossing

 crossing(a, year = 1990:1992)
# A tibble: 9 × 4
  code  item       id  year
  <chr> <chr>   <dbl> <int>
1 aaa   apples      1  1990
2 aaa   apples      1  1991
3 aaa   apples      1  1992
4 bbb   bananas     2  1990
5 bbb   bananas     2  1991
6 bbb   bananas     2  1992
7 ccc   carrots     3  1990
8 ccc   carrots     3  1991
9 ccc   carrots     3  1992

CodePudding user response:

Base R:

b <- data.frame(year = 1990:1992)
merge(a, b, by = NULL)
#   code    item id year
# 1  aaa  apples  1 1990
# 2  bbb bananas  2 1990
# 3  ccc carrots  3 1990
# 4  aaa  apples  1 1991
# 5  bbb bananas  2 1991
# 6  ccc carrots  3 1991
# 7  aaa  apples  1 1992
# 8  bbb bananas  2 1992
# 9  ccc carrots  3 1992

Data

a <- structure(list(code = c("aaa", "bbb", "ccc"), item = c("apples", "bananas", "carrots"), id = c(1, 2, 3)), class = "data.frame", row.names = c(NA, -3L))

CodePudding user response:

Another possible solution, based on tidyr::separate_rows:

library(tidyverse)

a <- data.frame(code = c("aaa" , "bbb", "ccc") ,
                item = c("apples" , "bananas" , "carrots") ,
                id = c(1,2,3))

a %>% 
  mutate(year = list(1990:1992)) %>% 
  separate_rows(year, sep=",", convert = T)

#> # A tibble: 9 × 4
#>   code  item       id  year
#>   <chr> <chr>   <dbl> <int>
#> 1 aaa   apples      1  1990
#> 2 aaa   apples      1  1991
#> 3 aaa   apples      1  1992
#> 4 bbb   bananas     2  1990
#> 5 bbb   bananas     2  1991
#> 6 bbb   bananas     2  1992
#> 7 ccc   carrots     3  1990
#> 8 ccc   carrots     3  1991
#> 9 ccc   carrots     3  1992

CodePudding user response:

We could replicate the rows N times and then mutate the year column:

library(dplyr)

N <- 3
a %>% 
  group_by(code) %>% 
  slice(rep(1:n(), each = N)) %>% 
  mutate(year= 1990:1992)
  code  item       id  year
  <chr> <chr>   <dbl> <int>
1 aaa   apples      1  1990
2 aaa   apples      1  1991
3 aaa   apples      1  1992
4 bbb   bananas     2  1990
5 bbb   bananas     2  1991
6 bbb   bananas     2  1992
7 ccc   carrots     3  1990
8 ccc   carrots     3  1991
9 ccc   carrots     3  1992

CodePudding user response:

Using sqldf:

library(sqldf)

a <- data.frame(
  code = c("aaa" , "bbb", "ccc") ,
  item = c("apples" , "bananas" , "carrots") ,
  id = c(1, 2, 3)
)
y <- data.frame(year = 1990:1992)

sqldf("SELECT * FROM a
      CROSS JOIN y")

Output

  code    item id year
1  aaa  apples  1 1990
2  aaa  apples  1 1991
3  aaa  apples  1 1992
4  bbb bananas  2 1990
5  bbb bananas  2 1991
6  bbb bananas  2 1992
7  ccc carrots  3 1990
8  ccc carrots  3 1991
9  ccc carrots  3 1992
  •  Tags:  
  • Related