I am having trouble with pandas replace-function. Let's say we have an example dataframe like this:
df = pd.DataFrame({'State': ['Georgia', 'Alabama', 'Tennessee'],
'Cities': [['Atlanta', 'Albany'], ['Montgomery', 'Huntsville', 'Birmingham'], ['Nashville', 'Knoxville']]})
>>> df
State Cities
0 Georgia [Atlanta, Albany]
1 Alabama [Montgomery, Huntsville, Birmingham]
2 Tennessee [Nashville, Knoxville]
Now I want to replace the state names and city names all by abbreviations. I have two dictionaries that define the replacement values:
state_abbrv = {'Alabama': 'AL', 'Georgia': 'GA', 'Tennessee': 'TN'}
city_abbrv = {'Albany': 'Alb.', 'Atlanta': 'Atl.', 'Birmingham': 'Birm.',
'Huntsville': 'Htsv.', 'Knoxville': 'Kxv.',
'Montgomery': 'Mont.', 'Nashville': 'Nhv.'}
When using pd.DataFrame.replace() on the "States" column (which only contains one value per row) it works as expected and replaces all state names:
>>> df.replace({'State': state_abbrv})
State Cities
0 GA [Atlanta, Albany]
1 AL [Montgomery, Huntsville, Birmingham]
2 TN [Nashville, Knoxville]
I was hoping that it would also individually replace all matching names within the lists of the "Cities" column, but unfortunately it does not seem to work as all cities remain unabbreviated:
>>> df.replace({'Cities': city_abbrv})
State Cities
0 Georgia [Atlanta, Albany]
1 Alabama [Montgomery, Huntsville, Birmingham]
2 Tennessee [Nashville, Knoxville]
How do I get the pd.DataFrame.replace() function to individually circle through all list elements in the column per row and replace accordingly?
CodePudding user response:
Try:
explodeto split the list into individual rowsreplaceeach column using the relevant dictionarygroupbyandaggto get back the original structure
>>> output = df.explode("Cities").replace({"State": state_abbrv, "Cities": city_abbrv}).groupby("State", as_index=False)["Cities"].agg(list)
State Cities
0 AL [Mont., Htsv., Birm.]
1 GA [Atl., Alb.]
2 TN [Nhv., Kxv.]
