Home > Enterprise >  Calculated column based on secondary key in Pandas Dataframe
Calculated column based on secondary key in Pandas Dataframe

Time:01-06

My data looks somewhat like this:

key city currentCityKey
1 Boston NaN
2 New York 1
3 Concord 2

I'd like to create a new column with the current city name, which needs to be found by finding the city that corresponds to the currentCityKey value, returning this:

key city currentCityKey currentCity
1 Boston NaN NaN
2 New York 1 Boston
3 Concord 2 New York

I've tried a lot of things, but I run into problems trying to return the value from the city column. This should just be a simple if currentCityKey is notnull then city where currentCityKey == key, but I just can't seem to get from this logic to actual code.

Raw input:

df = pd.DataFrame({'key': [1, 2, 3],
 'city': ['Boston', 'New York', 'Concord'],
 'currentCityKey': [nan, 1.0, 2.0]})

CodePudding user response:

Create a mapper with the 'key' and 'city' columns and use map on 'currentCityKey' column to obtain 'currentCity' column:

df['currentCity'] = df['currentCityKey'].map(df.set_index('key')['city'])

Output:

   key      city  currentCityKey currentCity
0    1    Boston             NaN         NaN
1    2  New York             1.0      Boston
2    3   Concord             2.0    New York
  •  Tags:  
  • Related