Lets say I have a DataFrame df with a multi index ['siec', 'geo'] (shown in italic):
| siec | geo | value |
|---|---|---|
| a | DE | 1 |
| a | FR | 2 |
and a mapping DataFrame mapping_df from geo to id_region with a single index ['geo']:
| geo | id_region |
|---|---|
| DE | 10 |
| FR | 20 |
=> How can I join/merge/replace the index column 'geo' of df with the values of the column 'id_region' from mapping_df?
Expected result with new multi index ['siec', 'id_region']:
| siec | id_region | value |
|---|---|---|
| a | 10 | 1 |
| a | 20 | 2 |
I tried following code:
import pandas as pd
df = pd.DataFrame([{'siec': 'a', 'geo': 'DE', 'value': 1}, {'siec': 'a', 'geo': 'FR', 'value': 1}])
df.set_index(['siec', 'geo'], inplace=True)
mapping_df = pd.DataFrame([{'geo': 'DE', 'id_region': 10}, {'geo': 'FR', 'id_region': 20}])
mapping_df.set_index(['geo'], inplace=True)
joined_data = df.join(mapping_df)
merged_data = df.merge(mapping_df, left_index=True, right_index=True)
but it does not do what I want. It adds an additional column and keeps the old index.
| siec | geo | value | id_region |
|---|---|---|---|
| a | DE | 1 | 10 |
| a | FR | 2 | 20 |
=> Is there a convenient method for my use case or would I need to manually correct the index after a joining step?
As a workaround, I could reindex() the DataFrames, do some joining manipulations and then reintroduce some multi index.
However, I would like to avoid switching back and forth between the indexed and non-indexed versions of the DataFrames if possible (?).
CodePudding user response:
Try as follows.
- Use
MultiIndex.get_level_valuesto select only level1(or:geo) and applyIndex.mapwithmapping_df['id_region']as mapper. - Wrap the result inside
MultiIndex.set_levelsto overwrite level1. - Finally, chain
Index.set_namesto rename the level (or useMultiIndex.rename).
df.index = df.index.set_levels(
df.index.get_level_values(1).map(mapping_df['id_region']), level=1)\
.set_names('id_region', level=1)
print(df)
value
siec id_region
a 10 1
20 2
