I have a DataFrame that has missing values in one or more columns, and I am trying to lookup the missing values based on the values in two other columns for the same record (i.e., fill city_name and city_id columns based on the coordinates in lat and long columns.)
| city_name | city_id | lat | long |
|---|---|---|---|
| NaN | NaN | -121.77 | 37.24 |
| NaN | NaN | -122.77 | 38.24 |
| NaN | NaN | -123.77 | 39.24 |
| new york | c1 | -121.77 | 37.24 |
| paris | c2 | -122.77 | 38.24 |
| london | c3 | -123.77 | 39.24 |
How can I do this?
CodePudding user response:
Try with groupby and fillna:
df = df.fillna(df.groupby(["lat", "long"]).transform("first"))
>>> df
city_id city_name lat long
0 c1 new york -121.77 37.24
1 c2 paris -122.77 38.24
2 c3 london -123.77 39.24
3 c1 new york -121.77 37.24
4 c2 paris -122.77 38.24
5 c3 london -123.77 39.24
