I have a dataset that's columns are like this:
Forecaster CountryName ProductName X2022 X2023 file_month
Allianz USA Inflation 6.1 2.5 7
EIU USA Inflation 7.7 3.7 7
Oxford USA Inflation 6.5 2.9 7
Allianz USA Inflation NA 3.3 8
EIU USA Inflation 8.3 3.7 8
Data is in long format, file_month shows the month that the data was made. Now i need to create new column that refers to updated values by Forecaster, CountryName, ProductName and either X2022 or X2023. So in this new column it should be checked if Forecaster revised it's forecast compared to one month before and if so, the new revised value should be written in the column.
Also, as there is more than one ProductName, CountryName and Forecaster in whole dataset, I would like to know how could i make this work for every of these groups.
New data would need to look something like this
Forecaster CountryName ProductName X2022 X2023 file_month update_2022 update_2023
Allianz USA Inflation 6.1 2.5 7
EIU USA Inflation 7.7 3.7 7
Oxford USA Inflation 6.5 2.9 7
Allianz USA Inflation NA 3.3 8 NA 3.3
EIU USA Inflation 8.3 3.7 8 8.3 NA
For code reproducability i add dput() output.
data <- structure(list(Forecaster = c("Moody's Analytics", "Oxford Economics",
"Julius Baer", "Fitch Solutions", "SEB", "Allianz", "EIU", "FrontierView",
"Euromonitor Int.", "Fitch Ratings", "Scope Ratings", "Allianz",
"EIU", "Euromonitor Int.", "Fitch Ratings", "Fitch Solutions",
"FrontierView", "Julius Baer", "Moody's Analytics", "Oxford Economics",
"Scope Ratings", "SEB", "Allianz", "EIU", "Euromonitor Int.",
"Fitch Ratings", "Fitch Solutions", "FrontierView", "Julius Baer",
"Moody's Analytics", "Oxford Economics", "Scope Ratings", "SEB",
"Allianz", "EIU", "Euromonitor Int.", "Fitch Ratings", "Fitch Solutions",
"FrontierView", "Julius Baer", "Moody's Analytics", "Oxford Economics",
"Scope Ratings", "SEB", "Allianz", "EIU", "Euromonitor Int.",
"Fitch Ratings", "Fitch Solutions", "FrontierView", "Julius Baer",
"Moody's Analytics", "Oxford Economics", "Scope Ratings"),
CountryName = c("USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA",
"USA", "USA", "USA", "USA", "USA", "USA", "USA", "USA"),
ProductName = c("Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation", "Inflation", "Inflation",
"Inflation", "Inflation", "Inflation"),
X2022 = c(6.58180388987726,
7.39615, 5.71530516215215, 6.5, NA, 6.1, 7.2, 6.2, 7.31, 6.9,
NA, 6.1, 7.7, 7.49, 6.9, 6.5, 6.2, 7.01535416655227, 6.94177167480419,
7.40344, NA, 7.5, 6.1, 7.7, 7.49, 6.9, 6.5, 6.2, 7.01535416655227,
6.94177167480419, 7.40344, NA, 7.5, NA, 8.3, 7.99, 7.8313, 7.3,
8.4, 7.45242025035952, 7.51681072158563, 8.04052, NA, 7.5, 8.2,
8.1, 8.01, 7.8313, 7.7, 8.6, 7.94553624147252, 7.82785941572661,
8.11193, NA),
X2023 = c(2.62716826359282, 1.73992, 2.32590118928029,
2.5, NA, 2.5, 2.8, 3.3, 4.02, 2.7, NA, 2.5, 3.7, 4.01, 2.7, 2.5,
3.3, 2.96909559451206, 2.70488419782626, 2.01984, NA, 4.2, 2.5,
3.7, 4.01, 2.7, 2.5, 3.3, 2.96909559451206, 2.70488419782626,
2.01984, NA, 4.2, NA, 3.7, 4.21, 3.7199, 3.5, 4.9, 3.02315479834998,
3.10232274684858, 3.17401, NA, 4.2, 3.2, 3.8, 4, 3.7199, 3.7,
5.7, 3.48906158848119, 3.36435085973137, 3.27724, NA),
file_month = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L,
9L, 9L, 9L, 9L, 9L)), class = "data.frame", row.names = c(NA,-54L))
CodePudding user response:
Here are some notes about how things work here:
- First we group the data set on Forecaster variable. For this purpose we use
dplyr::group_splitfunction that creates a list where each elements is your data set with a unique Forecaster value. - Second we use
purrr::map_dfrto apply a function on each elements of the previous list. In a way that we create a custom function and apply it on each data frame.dfrsuffice implies that in the end themapbinds all the modified data frames together by row. - In this case we could consider a rolling function. There are various packages that you could use for this purpose such as
zoo,runner,sliderand etc. I used slider because it works well withtidyversefunctions. - I used
slider::slide_dblfunction where it takes a given function and apply it on a sliding window of the size that we choose. Here we would like to compare each value corresponding to a month to its previous value so the size of our sliding window is 2 as specified by.beforeargument. - Then I wrote a custom function which is in fact a series of
if,elseclause to apply what you have in mind. - I then set
.completeargument inslider_dbltoTRUE, so that the function is only applied on a complete window. It means for the first month that appears in the data set there is no output value cause there is no previous month. - In the end I would like to apply the entire
slider_dblfunction on 2 columns of each data frame so I used it insidedplyer::across.
library(tidyverse)
library(slider)
data %>%
group_split(Forecaster) %>%
map_dfr(~ .x %>%
mutate(across(c(X2022, X2023), ~ slide_dbl(., .f = ~ if(is.na(.x[2])) {
.x[2]
} else if(is.na(.x[1]) & !is.na(.x[2])){
.x[2]
} else if((.x[1] - .x[2]) != 0) {
.x[2]
} else {
NA_real_
}, .before = 1, .complete = TRUE), .names = 'Updated_{.col}')))
# A tibble: 54 × 8
Forecaster CountryName ProductName X2022 X2023 file_month Updated_X2022 Updated_X2023
<chr> <chr> <chr> <dbl> <dbl> <int> <dbl> <dbl>
1 Allianz USA Inflation 6.1 2.5 5 NA NA
2 Allianz USA Inflation 6.1 2.5 6 NA NA
3 Allianz USA Inflation 6.1 2.5 7 NA NA
4 Allianz USA Inflation NA NA 8 NA NA
5 Allianz USA Inflation 8.2 3.2 9 8.2 3.2
6 EIU USA Inflation 7.2 2.8 5 NA NA
7 EIU USA Inflation 7.7 3.7 6 7.7 3.7
8 EIU USA Inflation 7.7 3.7 7 NA NA
9 EIU USA Inflation 8.3 3.7 8 8.3 NA
10 EIU USA Inflation 8.1 3.8 9 8.1 3.8
# … with 44 more rows
I suggest you run this code line by line to check the output and get a sense of what is going on here.
