I have the following DataFrame in pandas:
| code | town | village | city |
|---|---|---|---|
| 01 | Brunete | NaN | NaN |
| 02 | NaN | Cabrera d'Anoia | NaN |
| 03 | NaN | NaN | Barcelona |
| 04 | Zarzalejo | NaN | Madrid |
| 07 | Melilla | NaN | City of Melilla |
| 08 | Cartagena | Galifa | Region of Murcia |
I want to merge the columns town, city and village into one. In case more than 1 column contains a value other than NaN, the resulting column will be assigned the value in this order of preference: village > town > city. Example:
| code | merged_column |
|---|---|
| 01 | Brunete |
| 02 | Cabrera d'Anoia |
| 03 | Barcelona |
| 04 | Zarzalejo |
| 07 | Melilla |
| 08 | Galifa |
CodePudding user response:
Let us define the order of columns then select those columns and use backfill along columns axis
order = ['village', 'town', 'city']
df['merged'] = df[order].bfill(axis=1).iloc[:, 0]
code town village city merged
0 1 Brunete NaN NaN Brunete
1 2 NaN Cabrera d'Anoia NaN Cabrera d'Anoia
2 3 NaN NaN Barcelona Barcelona
3 4 Zarzalejo NaN Madrid Zarzalejo
4 7 Melilla NaN Melilla Melilla
5 8 Cartagena Galifa Region of Murcia Galifa
CodePudding user response:
What you want is a coalesce pandas version. Here you can find a related question to this.
An alternative method is the combine_first method, which can be translated to the sql-coalesce function:
df = pd.DataFrame(
{"code": ["01", "02", "03", "04", "07", "08"],
"town": ["Brunete", pd.NA, pd.NA, "Zarzalejo", "Melilla", "Cartagena"],
"village": [pd.NA, "Cabrera d'Anoia", pd.NA, pd.NA, pd.NA, "Galifa"],
"city": [pd.NA, pd.NA, "Barcelona", "Madrid", "Melilla", "Region of Murcia"]}
)
df["village"].combine_first(df["town"]).combine_first(df["city"])
----------------------------
0 Brunete
1 Cabrera d'Anoia
2 Barcelona
3 Zarzalejo
4 Melilla
5 Galifa
Name: village, dtype: object
-----------------------------
