I have this huge dataset in which I have to replace each country's name with the corresponding ISO code. I have stored the ISO code of each country into another df. e.g.
df1:
| TERRITORY |
|---|
| France, Italy |
| Italy |
| Spain, France |
| France, Spain, Italy |
df2:
| COUNTRY | ISO CODE |
|---|---|
| France | FR |
| Italy | IT |
| Spain | ES |
Expected output:
| TERRITORY |
|---|
| FR, IT |
| IT |
| ES, FR |
| FR, ES, IT |
My last try was to convert the element into tuples and then replace, but it doesn't work (and I don't know if it makes sense, but here's my code anyway):
country = tuple(list(df2['COUNTRY']))
iso = tuple(list(df2['ISO CODE']))
z = zip(x, y)
for x, y in z:
if x in df1['TERRITORY']:
df1['TERRITORY'].str.replace(x, y)
But it doesn't change anything.
CodePudding user response:
You can use a combination of .str.split .explode, then .replace .set_index, and finally .groupby(level=0) agg(list) .str.join:
df1['TERRITORY'] = df1['TERRITORY'].str.split(', ').explode().replace(df2.set_index('COUNTRY')['ISO CODE']).groupby(level=0).agg(list).str.join(', ')
Output:
>>> df1
TERRITORY
0 FR, IT
1 IT
2 ES, FR
3 FR, ES, IT
CodePudding user response:
If you have all the mapping you can also just use replace():
df1.replace(df2.set_index('COUNTRY').squeeze(), regex=True)
Output:
Territory
0 FR, IT
1 IT
2 ES, FR
3 FR, ES, IT
