I have a pandas DataFrame like this
name loc_x loc_y grp_name
a1 1.0 2.0 set1
a2 2.0 3.0 set1
a3 3.2 4.1 set2
a4 7.9 4.2 set2
I want to generate a GeoDataFrame that generates a polygon using loc_x and loc_y grouped on grp_name and also includes a column name that has the values in my original data frame concatenated by |? The result should look like this
name geometry
set1 a1|a2 POLYGON ((1.0, 2.0)...)
set2 a3|a4 POLYGON ((3.2, 4.1)...)
I do this to get the geometry column but how do I also get an additional column with name concatenated from my base data frame?
gdf = gpd.GeoDataFrame(geometry=df.groupby('grp_name').apply(
lambda g: Polygon(gpd.points_from_xy(g['loc_x'], g['loc_y']))))
CodePudding user response:
- required a modification to your test data. A polygon has a minimum of three points
- this comes down to knowing pandas.
groupby().apply()provides a reference to dataframe for each group. It's then simple to construct the two outputs you want per group
import pandas as pd
import geopandas as gpd
import shapely.geometry
import io
df = pd.read_csv(io.StringIO("""name loc_x loc_y grp_name
a1 1.0 2.0 set1
a2 2.0 3.0 set1
a2.5 3.0 4.0 set1
a3 3.2 4.1 set2
a4 7.9 4.2 set2
a4.5 8.1 4.3 set2"""),sep="\s ",)
gpd.GeoDataFrame(
df.groupby("grp_name").apply(
lambda d: pd.Series(
{
"name": "|".join(d["name"].tolist()),
"geometry": shapely.geometry.Polygon(
d.loc[:, ["loc_x", "loc_y"]].values
),
}
)
)
)
