My two dataframes looks like this:
> dput(head(df1,25))
structure(list(Date = structure(c(16644, 16645, 16646, 16647,
16648, 16649, 16650, 16651, 16652, 16653, 16654, 16655, 16656,
16657, 16658, 16659, 16660, 16661, 16662, 16663, 16664, 16665,
16666, 16667, 16668), class = "Date"), AU = c(0.241392906920806,
0.257591745069017, 0.263305712230276, NaN, 0.252892547032525,
0.251771180928526, 0.249211746794207, 0.257289083109259, 0.205017582640463,
0.20072274573488, 0.210154167590338, 0.207384553271337, 0.193725450540089,
0.199282601988984, 0.216267134143314, 0.217052471451736, NaN,
0.220703029531909, 0.2164619798534, 0.223442036108148, 0.22061326758891,
NaN, 0.277777461504811, NaN, 0.200839628485262)), row.names = c(NA,
-25L), class = c("tbl_df", "tbl", "data.frame"))
> dput(head(df2,25))
structure(list(UF1 = c(0.2559, 0.2565, 0.257, 0.2577, 0.2583,
0.259, 0.2596, 0.2603, 0.2611, 0.2618, 0.2625, 0.2633, 0.2641,
0.2649, 0.2657, 0.2665, 0.2674, 0.2682, 0.2691, 0.27, 0.2709,
0.2718, 0.2727, 0.2736, 0.2745), UF2 = c(0.2597, 0.2602, 0.2608,
0.2614, 0.2621, 0.2627, 0.2634, 0.2641, 0.2648, 0.2655, 0.2663,
0.267, 0.2678, 0.2686, 0.2694, 0.2702, 0.2711, 0.2719, 0.2728,
0.2737, 0.2745, 0.2754, 0.2763, 0.2773, 0.2782), UF3 = c(0.2912,
0.2915, 0.2918, 0.2922, 0.2926, 0.293, 0.2934, 0.2938, 0.2943,
0.2947, 0.2952, 0.2957, 0.2962, 0.2968, 0.2973, 0.2979, 0.2985,
0.2991, 0.2997, 0.3003, 0.3009, 0.3016, 0.3022, 0.3029, 0.3035
), Date = structure(c(16644, 16645, 16646, 16647, 16648, 16649,
16650, 16651, 16652, 16653, 16654, 16655, 16656, 16657, 16658,
16659, 16660, 16661, 16662, 16663, 16664, 16665, 16666, 16667,
16668), class = "Date")), row.names = c(NA, 25L), class = "data.frame")
>
I want to do the mean of two different dataframes columns subtracting (mean(df1$AU-df2$UF)). The closest to the solution I got is the following:
data.frame(mean = colMeans(df1$AU, na.rm = TRUE) - colMeans(df2$UF))
but I got this error:
Error in colMeans(df1$mAU, na.rm = TRUE) :
'x' must be an array of at least two dimensions
I succeed to run the same code only for dataframes with one column each, but since I have 3 or more columns per dataframe I want calculate against df1$AU I need to be more efficient.
Any help will be much appreciated. Thank you.
CodePudding user response:
Assuming what you meant is that you want the subtraction of the means of the (numeric) columns in df1 with the mean of the (numeric) columns in df2, this can be done like this:
mean(df1$AU, na.rm = T) - colMeans(df2[,1:3], na.rm = T)
this outputs:
UF1 UF2 UF3
-0.0367389 -0.0404509 -0.0688949
per column of the df2
I hope this is helpful.
CodePudding user response:
Here are two base R functions to compute the mean of the differences. The 2nd is faster.
meanDiffs1 <- function(x, y, na.rm = TRUE){
z <- if(na.rm) na.omit(cbind(x, -1*y)) else cbind(x, -1*y)
mean(rowSums(z))
}
meanDiffs2 <- function(x, y, na.rm = TRUE){
if(na.rm){
i <- is.na(x)
j <- is.na(y)
mean(x[!i & !j] - y[!i & !j])
} else {
mean(x - y)
}
}
meanDiffs(df1$AU, df2$UF1)
#[1] -0.0361429
meanDiffs2(df1$AU, df2$UF1)
#[1] -0.0361429
To compute all mean differences between df1$AU and df$UF*, use sapply.
sapply(df2[1:3], \(y) meanDiffs2(df1$AU, y))
# UF1 UF2 UF3
#-0.03614290 -0.03986195 -0.06848576
