Home > OS >  How to change values in a given column based on a condition and put those new values in a new column
How to change values in a given column based on a condition and put those new values in a new column

Time:01-21

I have a pandas dataframe called trade_lanes that shows an output below:

Departure Country Arrival Country
Malaysia Poland
Germany USA
Germany Cameroon
Argentina Vietnam
Algeria Slovakia
China Vietnam
Denmark Singapore

What I would like to do is create 2 new columns depending on the country: "Departure Region" and "Arrival Region" in the dataframe trade_lanes such that it looks like this as the output:

Departure Region Departure Country Arrival Region Arrival Country
APAC Malaysia NECE Poland
NECE Germany AMERICAS USA
NECE Germany WEMEA Cameroon
AMERICAS Argentina APAC Viet Nam
WEMEA Algeria NECE Slovakia
APAC China APAC Vietnam
NECE Denmark APAC Japan
Others Tonga APAC Indonesia

I have been working around for loops to go about this but I think I'm getting the flow processes wrong and because there's a condition where a country is tagged to region, it makes it even more complicated. Also note that, I believe I need to use a For-if-elif-else loop because there are countries in which I would label them as "Others" if the countries do not fall in the scope of regions.

I was thinking of duplicating "Departure Country" and "Arrival Country" columns and then replace it manually but I am pretty sure there's an easier way to do it using for loop.

This was what I attempted:

for elements in range(len(trade_lane)):
    
    apac = {"AUSTRALIA": "APAC", "BANGLADESH": "APAC", 
                  "CHINA": "APAC", "HONG KONG": "APAC",
                  "INDIA": "APAC", "INDONESIA": "APAC",
                  "JAPAN": "APAC", "MALAYSIA": "APAC",
                  "MALAYSIA": "APAC", "NEW ZEALAND": "APAC",
                     "SINGAPORE": "APAC", "KOREA": "APAC",
                    "TAIWAN": "APAC", "THAILAND": "APAC", "VIET NAM": "APAC"}

    nece = {"BELGIUM": "NECE", "CZECH REPUBLIC": "NECE",
                  "DENMARK": "NECE", "GERMANY": "NECE", "HUNGARY": "NECE",
                  "LUXEMBOURG": "NECE", "NETHERLANDS": "NECE",
                  "NORWAY": "NECE", "POLAND": "NECE", "ROMANIA": "NECE",
                  "SLOVAKIA": "NECE", "SWEDEN": "NECE", "TURKEY": "NECE"}
    
    wemea = {"ALGERIA": "WEMEA", "BAHRAIN": "WEMEA", 
                   "CAMEROON": "WEMEA", "CHAD": "WEMEA", "FRANCE": "WEMEA",
                   "GREECE": "WEMEA", "IRISH REPUBLIC": "WEMEA",
                   "ITALY": "WEMEA", "MOROCCO": "WEMEA",
                   "PORTUGAL": "WEMEA", "QATAR": "WEMEA",
                   "SAUDI ARABIA": "WEMEA", "SOUTH AFRICA": "WEMEA",
                   "SPAIN": "WEMEA", "TUNISIA": "WEMEA", "UGANDA": "WEMEA",
                   "UNITED ARAB EMIRATES": "WEMEA", "UNITED KINDGOM":"WEMEA"}
    
    americas = {"ARGENTINA": "AMERICAS", "BRAZIL": "AMERICAS",
                      "CANADA": "AMERICAS", "CHILE": "AMERICAS",
                      "COLOMBIA": "AMERICAS", "MEXICO": "AMERICAS",
                      "PERU": "AMERICAS", "UNITED STATES": "AMERICAS"}
    
    for x,y in apac.items():
        trade_lane["Departure Region"].values = trade_lane["Departure Country"].values[elements].replace(x,y)
        
trade_lane

However I get a KeyError: 'Departure Region'

CodePudding user response:

Say your dataframe trade lanes is called df:

all_regions = {**apac, **nece, **wemea, **americas} # merge your dictionaries into one
df['Departure Region'] = df['Departure Country'].map(all_regions) #map countries to regions
df['Departure Region'] = df['Departure Region'].fillna('Others') #If any country not found in the map

You can do the same process for 'Arrival region'.

A for-loop version of this could be:

all_regions = {**apac, **nece, **wemea, **americas} # merge your dictionaries into one

temp = []
for i,row in df.iterrows():
    if row['Departure Country'] in all_regions:
        temp.append(all_regions[row['Departure Country']])
   #else if: # add here corner cases.
   #    do something 
    else:
        temp.append('Others')

df['Departure Region'] = temp

  •  Tags:  
  • Related