I have a dataset that looks like the following:
structure(list(X = c(36, 37, 38, 39, 40, 41, 1, 2, 3, 4, 5, 6
), Y = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), region_ID = c(0,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -12L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x7fb8fc819ae0>)
I want to match every row whose region_ID=0 with the rows whose region_ID=1 and calculate
dist_to_r1=sqrt((X - i.X)^2 (Y - i.Y)^2))
where i. prefix refers to the latter rows. I want to do this using data table syntax.
I have been trying to do this with left joins, but couldn't make it work.
CodePudding user response:
You want a full join, such that each of the six rows in region 0 are joined to the six rows in region 1?.
In that case, you can simply set allow.cartesian = T:
data[, id:=1][region_ID==0][data[region_ID==1], on ="id", allow.cartesian=T][, dist_to_r1:=sqrt((X-i.X)^2 (Y-i.Y)^2)][]
Edit: OP clarified that only the minimum distance to a point in region 0 is required. In this case, we can do something like this:
data[,id:=1]
region0 = data[region_ID==0]
# function that gets the minimum distance between two regions
get_min_dist <- function(region_a, region_b) {
region_a[region_b, on="id", allow.cartesian=T][,min(sqrt((X-i.X)^2 (Y-i.Y)^2))]
}
# apply the function above to every region
data[,
(min_dist_to_zero = get_min_dist(
region_a = region0,
region_b = data[region_ID==.BY]
)),
by=region_ID]
Output:
region_ID min_dist_to_zero
1: 0 0
2: 1 30
