Let's have a list lis
chicago = data.frame('city' = rep('chicago'), 'year' = c(2018,2019,2020), 'population' = c(100, 105, 110))
paris = data.frame('city' = rep('paris'), 'year' = c(2018,2019,2020), 'population' = c(200, 205, 210))
berlin = data.frame('city' = rep('berlin'), 'year' = c(2018,2019,2020), 'population' = c(300, 305, 310))
bangalore = data.frame('city' = rep('bangalore'), 'year' = c(2018,2019,2020), 'population' = c(400, 405, 410))
lis = list(chicago = chicago, paris = paris, berlin = berlin, bangalore = bangalore)
Now I have a new df containing latest data for each city,
df = data.frame('city' = c('chicago', 'paris', 'berlin', 'bangalore'), 'year' = rep(2021), 'population' = c(115, 215, 315, 415))
I want to add each row of df to lis based on city.
I do it by,
#convert to datframe
lis = dplyr::bind_rows(lis)
#rbind
lis = rbind(lis, df)
#again convert to list
lis = split(lis, lis$city)
which is inefficient for large datsets. Is their any efficient alternate for large datsets?
Thank you.
Edit
Unit: seconds
expr min lq mean median uq max neval
ac() 22.43719 23.17452 27.85401 24.80335 25.62127 43.23373 5
The list contains 2239 dataframes and dimension of each dataframe is 310x15. Each of these dataframe grow daily.
CodePudding user response:
We may use imap to loop over the list, and filter the 'df' based on the names of the list to append the row in each of the list elements
library(dplyr)
library(purrr)
lis2 <- imap(lis, ~ .x %>%
bind_rows(df %>%
filter(city == .y)))
-output
> lis2
$chicago
city year population
1 chicago 2018 100
2 chicago 2019 105
3 chicago 2020 110
4 chicago 2021 115
$paris
city year population
1 paris 2018 200
2 paris 2019 205
3 paris 2020 210
4 paris 2021 215
$berlin
city year population
1 berlin 2018 300
2 berlin 2019 305
3 berlin 2020 310
4 berlin 2021 315
$bangalore
city year population
1 bangalore 2018 400
2 bangalore 2019 405
3 bangalore 2020 410
4 bangalore 2021 415
Or using base R with Map and rbind
Map(function(x, nm) rbind(x, df[df$city == nm,]), lis, names(lis))
Or use rbindlist from data.table
library(data.table)
rbindlist(c(lis, list(df)))[, .(split(.SD, city))]$V1
Or a slightly more efficient, will be with split
Map(rbind, lis, split(df, df$city)[names(lis)])
