I need to identify the shortest distance between points, across two dataframes.
Dataframe biz contains individual businesses, including their coordinates:
biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"),
lon = c(-3.276435,-4.175388,-4.181740,-3.821941),
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))
biz
name lon lat
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277
Dataframe city contains market cities, including their geocoordinates:
city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"),
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861),
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))
city
name lon lat
1 cityA -4.758804 10.64002
2 cityB -3.243278 10.95790
3 cityC -3.062628 13.06950
4 cityD -2.356686 13.20363
For each business in biz, I need to identify which market city is closest, and list the name of that market city in a new column:
biz
name lon lat city
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277
I know that I can use packages like geosphere to measure the distance between bizA and cityA coordinates. I'm struggling with: how to compare bizA to each city, minimize the distance, and then list that closest city in dataframe biz.
Any thoughts are much appreciated!
CodePudding user response:
You can use st_nearest_feature from sf:
cbind(
biz,
nearest_city = city[
st_nearest_feature(
st_as_sf(biz, coords = c("lon", "lat"), crs = 4326),
st_as_sf(city, coords = c("lon", "lat"), crs = 4326)
),
]$name
)
although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
name lon lat nearest_city
1 bizA -3.276435 11.96748 cityB
2 bizB -4.175388 12.19885 cityC
3 bizC -4.181740 13.04638 cityC
4 bizD -3.821941 11.84277 cityB
CodePudding user response:
I guess there are multiple ways to do this.
Here is one, that starts by creating all combinations of rows from the two data frames using the dfcombos function from here.
(I think there are some alternatives in packages on CRAN.)
Here the distance is just a random number, to demonstrate.
The closest cities are selected using duplicated after sorting with order.
There are alternatives to this approach as well, but it seemed simple.
source('dfcombos.R')
biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"),
lon = c(-3.276435,-4.175388,-4.181740,-3.821941),
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))
city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"),
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861),
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))
comb <- dfcombos(biz, city)
comb$dist <- runif(nrow(comb))
comb <- comb[order(comb$dist), ]
closest <- comb[!duplicated(comb$name), ]
