match every row whose `region_ID=0` with the rows whose `region_ID=1` and calculate a certain distan-CodePudding

I have a dataset that looks like the following:

structure(list(X = c(36, 37, 38, 39, 40, 41, 1, 2, 3, 4, 5, 6
), Y = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), region_ID = c(0, 
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -12L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x7fb8fc819ae0>)

I want to match every row whose region_ID=0 with the rows whose region_ID=1 and calculate

dist_to_r1=sqrt((X - i.X)^2   (Y - i.Y)^2))

where i. prefix refers to the latter rows. I want to do this using data table syntax.

I have been trying to do this with left joins, but couldn't make it work.

CodePudding user response：

You want a full join, such that each of the six rows in region 0 are joined to the six rows in region 1?.

In that case, you can simply set allow.cartesian = T:

data[, id:=1][region_ID==0][data[region_ID==1], on ="id", allow.cartesian=T][, dist_to_r1:=sqrt((X-i.X)^2   (Y-i.Y)^2)][]

Edit: OP clarified that only the minimum distance to a point in region 0 is required. In this case, we can do something like this:

data[,id:=1]
region0 = data[region_ID==0]

# function that gets the minimum distance between two regions
get_min_dist <- function(region_a, region_b) {
  region_a[region_b, on="id", allow.cartesian=T][,min(sqrt((X-i.X)^2   (Y-i.Y)^2))]
}

# apply the function above to every region

data[,
     (min_dist_to_zero = get_min_dist(
       region_a = region0,
       region_b = data[region_ID==.BY]
       )),
  by=region_ID]

Output:

   region_ID min_dist_to_zero
1:         0                0
2:         1               30