R aggregate by label difference-CodePudding

I have a large data frame containing import and export data of 28 EU countries, among other details. I want to calculate the trade balance (exports minus imports) of each country.

I managed to aggregate imports and exports for each country using this code:

x = aggregate( value_in_euros ~ as_factor(flow)   as_factor(declarant), data, sum)

And the resulting data set looks like this:

    as_factor(flow) as_factor(declarant) value_in_euros
1          Imports                   FR   1.019259e 12
2          Exports                   FR   8.539884e 11
3          Imports                   BE   6.936985e 11
4          Exports                   BE   7.350930e 11
5          Imports                   NL   1.043084e 12
6          Exports                   NL   1.180904e 12
7          Imports                   DE   2.048670e 12
8          Exports                   DE   2.415090e 12
9          Imports                   IT   7.399388e 11
10         Exports                   IT   8.671186e 11
11         Imports                   GB   9.705866e 10
12         Exports                   GB   6.669265e 10
13         Imports                   IE   1.725745e 11
14         Exports                   IE   3.150999e 11
15         Imports                   DK   1.707441e 11
16         Exports                   DK   1.897421e 11

Now I need an aggregate way to calculate the trade balance (Exports minus Imports) for each country. Can anyone provide me with some hint as to how to do it? Thanks.

CodePudding user response：

We may use split to split the dataframe by declarant and then compute the difference between flows.

set.seed(1)
df <- data.frame(flow=rep(c('Imports','Exports'), 8),
                 declarant=rep(c('FR','BE','NL','DE','IT','GB','IE','DK'), each=2),
                 value_in_euros=rnorm(n=16L,mean=1e12,sd=1e4))
out <- sapply(split(df, df$declarant), function(x) diff(x$value_in_euros))
# ------------------------------------------
> out[1:4]
        BE         DE         DK         FR 
 24309.094   2508.957 -11698.645   8100.971

CodePudding user response：

Consider reshaping after aggregation. Then, run the difference.

agg_df <- aggregate(
    value_in_euros ~ flow   declarant, data=data, FUN=sum
) |> reshape(
  idvar = "declarant",
  v.names = "value_in_euros",
  timevar = "flow",
  direction = "wide",
  sep = "_"
) |> transform(
  balance = value_in_euros_Exports - value_in_euros_Imports
)

agg_df
   declarant value_in_euros_Imports value_in_euros_Exports       balance
1         FR           1.019259e 12           8.539884e 11 -165270600000
3         BE           6.936985e 11           7.350930e 11   41394500000
5         NL           1.043084e 12           1.180904e 12  137820000000
7         DE           2.048670e 12           2.415090e 12  366420000000
9         IT           7.399388e 11           8.671186e 11  127179800000
11        GB           9.705866e 10           6.669265e 10  -30366010000
13        IE           1.725745e 11           3.150999e 11  142525400000
15        DK           1.707441e 11           1.897421e 11   18998000000

data

txt = '       flow              declarant value_in_euros
1          Imports                   FR   1.019259e 12
2          Exports                   FR   8.539884e 11
3          Imports                   BE   6.936985e 11
4          Exports                   BE   7.350930e 11
5          Imports                   NL   1.043084e 12
6          Exports                   NL   1.180904e 12
7          Imports                   DE   2.048670e 12
8          Exports                   DE   2.415090e 12
9          Imports                   IT   7.399388e 11
10         Exports                   IT   8.671186e 11
11         Imports                   GB   9.705866e 10
12         Exports                   GB   6.669265e 10
13         Imports                   IE   1.725745e 11
14         Exports                   IE   3.150999e 11
15         Imports                   DK   1.707441e 11
16         Exports                   DK   1.897421e 11'

agg_df <- read.table(text=txt)

CodePudding user response：

You can apply this directly on your data instead of doing aggregation.

library(dplyr)
library(tidyr)

data %>%
  select(flow, declarant, value_in_euros) %>%
  pivot_wider(names_from = flow, 
              values_from = value_in_euros, values_fn = sum) %>%
  mutate(difference = Exports - Imports)