I tried to calculate the lp norm of all pairs in one column. The answer just not right and I don't know why.
Here is my sample code.
a <- c(23,41,32,58,26,77,45,67,23,78,22,9,20)
lp_norm = function(x, y, p){
return(sum((abs(x-y))^p)^(1/p))
}
i = 1
while (i <= 13) {
for(j in i:12){
lp1 <- lp_norm(a[i],a[j 1],p=1)
}
i=i 1
print(lp1)
}
}
And I have a dataframe with 10 column need to do the same thing. How can I apply this to all column?
CodePudding user response:
Here is one way to calculate this for different combinations of columns in a dataframe.
library(tidyverse)
lp_norm <- function(data, x, y, p){
data |>
select(v1:= !!sym(x), v2:= !!sym(y))|>
summarise(lp_norm = sum((abs(v1-v2))^p)^(1/p)) |>
pull(lp_norm)
}
calc_lp_norm <- function(data, vars, p){
combn(vars, 2) |>
t() |>
`colnames<-`(c("var1", "var2")) |>
as_tibble() |>
mutate(lp_norm = map2_dbl(var1, var2, ~lp_norm(x = .x, y = .y, data = data, p = p)))
}
#few columns
calc_lp_norm(mtcars, c("mpg", "cyl", "hp", "wt"), p = 1)
#> # A tibble: 6 x 3
#> var1 var2 lp_norm
#> <chr> <chr> <dbl>
#> 1 mpg cyl 445.
#> 2 mpg hp 4051.
#> 3 mpg wt 540.
#> 4 cyl hp 4496
#> 5 cyl wt 95.0
#> 6 hp wt 4591.
#all columns
calc_lp_norm(mtcars, colnames(mtcars), p = 1)
#> # A tibble: 55 x 3
#> var1 var2 lp_norm
#> <chr> <chr> <dbl>
#> 1 mpg cyl 445.
#> 2 mpg disp 6740.
#> 3 mpg hp 4051.
#> 4 mpg drat 528.
#> 5 mpg wt 540.
#> 6 mpg qsec 136.
#> 7 mpg vs 629.
#> 8 mpg am 630.
#> 9 mpg gear 525.
#> 10 mpg carb 553.
#> # ... with 45 more rows
CodePudding user response:
We could either use combn (only returns pairwise combinations) in base R. Loop over the columns of data.frame 'dat', apply pair combinations of elements (assuming all are unique or else do combn(unique(u), 2) and apply the lp_norm function
lapply(dat, \(u) combn(u, 2, FUN = \(x) lp_norm(x[1], x[2], p = 1)))
Or if we need the output as a matrix (include pairwise combinations of mirror types as well i.e. 1 vs 2 and 2 vs 1 and 1 vs 1)
lapply(dat, \(u) outer(u, u, FUN = Vectorize(\(x, y) lp_norm(x, y, p = 1))))
But, as this is a distance function, using outer will be calculating the same distance twice distance between the same element
