I have 2 dataframes, with df1 containing the coordinates of points and df2 containing the start and end coordinates of lines. Such as:
df1
| point | lon | lat |
|---|---|---|
| P01 | 3 | 48 |
| P02 | 9 | 44 |
df2
| line | lon.start | lat.start | lon.end | lat.end |
|---|---|---|---|---|
| L01 | 3 | 48 | 3 | 49 |
I would like to replace the line variable in df2 with point.start and point.end, conditionally creating the column point.start based on lon.start and lat.start, and the same for point.end. Such that it becomes:
| point.start | point.end | lon.start | lat.start | lon.end | lat.end |
|---|---|---|---|---|---|
| P01 | P05 | 3 | 48 | 3 | 49 |
How could I do this?
CodePudding user response:
Using merge may help you here.
# generate sample data
df1 = pd.DataFrame([['P01', 3, 48], ['P02', 9, 44], ['P03', 3, 49]], columns=('point', 'lon', 'lat'))
df2 = pd.DataFrame([['L01', 3, 48, 3, 49], ['L02', 4, 48, 4, 49]], columns=('line', 'lon.start', 'lat.start', 'lon.end', 'lat.end'))
# handle start points
dfx = pd.merge(df1.rename(columns={'lon': 'lon.start', 'lat': 'lat.start', 'point': 'point.start'}),
df2[['line', 'lon.start', 'lat.start']],
on=['lon.start', 'lat.start'])
# handle end points
dfy = pd.merge(df1.rename(columns={'lon': 'lon.end', 'lat': 'lat.end', 'point': 'point.end'}),
df2[['line', 'lon.end', 'lat.end']],
on=['lon.end', 'lat.end'])
# merge both based on lines
df = pd.merge(dfx, dfy, on='line')
The provided answer will discard lines which have missing points. If you'd rather keep them and not change the shape of df2, replace the last merge by
df = pd.merge(dfx, dfy, on='line', how='outer')
CodePudding user response:
You could do:
df3 = df2.drop(columns="line")
for suffix in ("start", "end"):
df3 = df3.merge(
df1.rename(columns={col: f"{col}.{suffix}" for col in df1.columns}),
on=[f"{col}.{suffix}" for col in ("lon", "lat")],
how="left"
)
If the order of the columns is important then you have reorder them afterwards:
df3 = df3[[f"{col}.{suffix}" for col in df1.columns for suffix in ("start", "end")]]
