GeoPandas: Find nearest point in other dataframe

by RedM   Last Updated November 08, 2018 21:22 PM

I've got 2 geodataframes:

import geopandas as gpd
from shapely.geometry import Point
gpd1 = gpd.GeoDataFrame([['John',1,Point(1,1)],['Smith',1,Point(2,2)],['Soap',1,Point(0,2)]],columns=['Name','ID','geometry'])
gpd2 = gpd.GeoDataFrame([['Work',Point(0,1.1)],['Shops',Point(2.5,2)],['Home',Point(1,1.1)]],columns=['Place','geometry'])

and I want to find the name of the nearest point in gpd2 for each row in gpd1:

desired_output = 

    Name  ID     geometry  Nearest
0   John   1  POINT (1 1)     Home
1  Smith   1  POINT (2 2)    Shops
2   Soap   1  POINT (0 2)     Work

I've been trying to get this working using a lambda function:

gpd1['Nearest'] = gpd1.apply(lambda row: min_dist(row.geometry,gpd2)['Place'] , axis=1)

with

def min_dist(point, gpd2):

    geoseries = some_function()
    return geoseries


Answers 3


Figured it out:

def min_dist(point, gpd2):
    gpd2['Dist'] = gpd2.apply(lambda row:  point.distance(row.geometry),axis=1)
    geoseries = gpd2.iloc[gpd2['Dist'].argmin()]
    return geoseries

Of course some criticism is welcome. I'm not a fan of recalculating gpd2['Dist'] for every row of gpd1...

RedM
RedM
December 22, 2016 09:15 AM

You can directly use the Shapely function Nearest points (the geometries of the GeoSeries are Shapely geometries):

from shapely.ops import nearest_points
# unary union of the gpd2 geomtries 
pts3 = gpd2.geometry.unary_union
def near(point, pts=pts3):
     # find the nearest point and return the corresponding Place value
     nearest = gpd2.geometry == nearest_points(point, pts)[1]
     return gpd2[nearest].Place.get_values()[0]
gpd1['Nearest'] = gpd1.apply(lambda row: near(row.geometry), axis=1)
gpd1
    Name  ID     geometry  Nearest
0   John   1  POINT (1 1)     Home
1  Smith   1  POINT (2 2)    Shops
2   Soap   1  POINT (0 2)     Work

Explication

for i, row in gpd1.iterrows():
    print nearest_points(row.geometry, pts3)[0], nearest_points(row.geometry, pts3)[1]
 POINT (1 1) POINT (1 1.1)
 POINT (2 2) POINT (2.5 2)
 POINT (0 2) POINT (0 1.1)
gene
gene
December 22, 2016 20:16 PM

If you have large dataframes, I've found that scipy's cKDTree spatial index .query method returns very fast results for nearest neighbor searches. As it uses a spatial index it's orders of magnitude faster than looping though the dataframe and then finding the minimum of all distances. It is also faster than using shapely's nearest_points with RTree (the spatial index method available via geopandas) because cKDTree allows you to vectorize your search whereas the other method does not.

Here is a helper function that will return the distance and 'Name' of the nearest neighbor in gpd2 from each point in gpd1. It assumes both gdfs have a geometry column (of points).

from scipy.spatial import cKDTree  
def ckdnearest(gdA, gdB, bcol):   
    nA = np.array(list(zip(gdA.geometry.x, gdA.geometry.y)) )
    nB = np.array(list(zip(gdB.geometry.x, gdB.geometry.y)) )
    btree = cKDTree(nB)
    dist, idx = btree.query(nA,k=1)
    df = pd.DataFrame.from_dict({'distance': dist.astype(int),
                             'bcol' : gdB.loc[idx, bcol].values })
    return df

For your sample dataframes and desired result you'd run:

ckdnearest(gpd1, gpd2,'Name')

It returns a dataframe with distance and Name columns that you can insert back into gpd1

JHuw
JHuw
November 08, 2018 20:31 PM

Related Questions


Updated February 29, 2016 01:09 AM

Updated November 14, 2018 09:22 AM

Updated January 21, 2018 01:22 AM

Updated December 15, 2017 16:22 PM