I have a data frame containing a shape-file that I want to merge with another data-set that contains years. I'm interested in adding a variable with years to the former while all other variables remain the same for each year. I'm not sure how to do this.
As an example, say I have the following data-set:
a <- data.frame(code = c("aaa" , "bbb", "ccc") ,
item = c("apples" , "bananas" , "carrots") ,
id = c(1,2,3))
giving the following:
code item id
1 aaa apples 1
2 bbb bananas 2
3 ccc carrots 3
I would like to add a new variable called year of length n that repeats all the same elements of the other variables for each year. For example, say I'd like to add the years 1990 to 1992 to an existing object like this:
code item id year
1 aaa apples 1 1990
2 aaa apples 1 1991
3 aaa apples 1 1992
4 bbb bananas 2 1990
5 bbb bananas 2 1991
6 bbb bananas 2 1992
7 ccc carrots 3 1990
8 ccc carrots 3 1991
9 ccc carrots 3 1992
Is there a way of doing this (for existing data frames)? For this example I used this code;
b <- data.frame(code = rep(c("aaa" , "bbb", "ccc") , each = 3) ,
item = rep(c("apples" , "bananas" , "carrots") , each = 3) ,
id = rep(c(1,2,3) , each = 3) ,
year = rep(c(1990:1992) , times = 3))
but this would not work (or is extremely inefficient) when the data-set is already there or extremely large. Is there a better way of doing this?
CodePudding user response:
We can create a list column grouped by 'code' or 'id' and then unnest the list
library(dplyr)
library(tidyr)
a %>%
group_by(id) %>%
mutate(year = list(1990:1992)) %>%
ungroup %>%
unnest(year)
-output
# A tibble: 9 × 4
code item id year
<chr> <chr> <dbl> <int>
1 aaa apples 1 1990
2 aaa apples 1 1991
3 aaa apples 1 1992
4 bbb bananas 2 1990
5 bbb bananas 2 1991
6 bbb bananas 2 1992
7 ccc carrots 3 1990
8 ccc carrots 3 1991
9 ccc carrots 3 1992
Or use crossing
crossing(a, year = 1990:1992)
# A tibble: 9 × 4
code item id year
<chr> <chr> <dbl> <int>
1 aaa apples 1 1990
2 aaa apples 1 1991
3 aaa apples 1 1992
4 bbb bananas 2 1990
5 bbb bananas 2 1991
6 bbb bananas 2 1992
7 ccc carrots 3 1990
8 ccc carrots 3 1991
9 ccc carrots 3 1992
CodePudding user response:
Base R:
b <- data.frame(year = 1990:1992)
merge(a, b, by = NULL)
# code item id year
# 1 aaa apples 1 1990
# 2 bbb bananas 2 1990
# 3 ccc carrots 3 1990
# 4 aaa apples 1 1991
# 5 bbb bananas 2 1991
# 6 ccc carrots 3 1991
# 7 aaa apples 1 1992
# 8 bbb bananas 2 1992
# 9 ccc carrots 3 1992
Data
a <- structure(list(code = c("aaa", "bbb", "ccc"), item = c("apples", "bananas", "carrots"), id = c(1, 2, 3)), class = "data.frame", row.names = c(NA, -3L))
CodePudding user response:
Another possible solution, based on tidyr::separate_rows:
library(tidyverse)
a <- data.frame(code = c("aaa" , "bbb", "ccc") ,
item = c("apples" , "bananas" , "carrots") ,
id = c(1,2,3))
a %>%
mutate(year = list(1990:1992)) %>%
separate_rows(year, sep=",", convert = T)
#> # A tibble: 9 × 4
#> code item id year
#> <chr> <chr> <dbl> <int>
#> 1 aaa apples 1 1990
#> 2 aaa apples 1 1991
#> 3 aaa apples 1 1992
#> 4 bbb bananas 2 1990
#> 5 bbb bananas 2 1991
#> 6 bbb bananas 2 1992
#> 7 ccc carrots 3 1990
#> 8 ccc carrots 3 1991
#> 9 ccc carrots 3 1992
CodePudding user response:
We could replicate the rows N times and then mutate the year column:
library(dplyr)
N <- 3
a %>%
group_by(code) %>%
slice(rep(1:n(), each = N)) %>%
mutate(year= 1990:1992)
code item id year
<chr> <chr> <dbl> <int>
1 aaa apples 1 1990
2 aaa apples 1 1991
3 aaa apples 1 1992
4 bbb bananas 2 1990
5 bbb bananas 2 1991
6 bbb bananas 2 1992
7 ccc carrots 3 1990
8 ccc carrots 3 1991
9 ccc carrots 3 1992
CodePudding user response:
Using sqldf:
library(sqldf)
a <- data.frame(
code = c("aaa" , "bbb", "ccc") ,
item = c("apples" , "bananas" , "carrots") ,
id = c(1, 2, 3)
)
y <- data.frame(year = 1990:1992)
sqldf("SELECT * FROM a
CROSS JOIN y")
Output
code item id year
1 aaa apples 1 1990
2 aaa apples 1 1991
3 aaa apples 1 1992
4 bbb bananas 2 1990
5 bbb bananas 2 1991
6 bbb bananas 2 1992
7 ccc carrots 3 1990
8 ccc carrots 3 1991
9 ccc carrots 3 1992
