I have the following dataframe that lists information on fast food road stops.
Input
first_stop second_stop third_stop
mcdonalds burger king popeyes
mcdonalds N/A N/A
wendys kfc N/A
taco bell kfc wendys
popeyes kfc panda express
I want to create a new column summary that summarizes the stops like so:
Expected Output
first_stop second_stop third_stop summary
mcdonalds burger king popeyes mcdonalds -> burger king -> popeyes
mcdonalds N/A N/A mcdonalds
wendys kfc N/A wendys -> kfc
taco bell kfc wendys taco bell -> kfc -> wendys
popeyes kfc panda express popeyes -> kfc -> panda express
I cannot simply concatenate the three stop columns because some have N/A values if the stop did not exist. How can i do this in pandas?
I've tried this, but obviously it won't give me what i want:
df['summary'] = df['first_stop'] '->' df['second_stop'] '->' df['third_stop']
CodePudding user response:
Use stack to flatten your dataframe. stack drop NaN values by default then groupby index level 0 and finally join strings.
df['summary'] = df.stack().groupby(level=0).apply(lambda x: ' -> '.join(x))
print(df)
# Output
first_stop second_stop third_stop summary
0 mcdonalds burger king popeyes mcdonalds -> burger king -> popeyes
1 mcdonalds NaN NaN mcdonalds
2 wendys kfc NaN wendys -> kfc
3 taco bell kfc wendys taco bell -> kfc -> wendys
4 popeyes kfc panda express popeyes -> kfc -> panda express
CodePudding user response:
If you have a large dataset you could use a classical loop that will be faster than stack groupby:
df['summary'] = df.apply(lambda s: ' -> '.join(e for e in s if not pd.isna(e)),
axis=1)
or to stop on the first NA:
from itertools import takewhile
df['summary'] = df.apply(lambda s: ' -> '.join(
takewhile(lambda x: not pd.isna(x), s)
), axis=1)
Or, for this particular ' -> ' separator where the characters are not expected to be found in the words:
df['summary'] = df.fillna('').apply(' -> '.join, axis=1).str.rstrip('>- ')
NB. this is a trick, doesn't work on all separators
output:
first_stop second_stop third_stop summary
0 mcdonalds burger king popeyes mcdonalds -> burger king -> popeyes
1 mcdonalds NaN NaN mcdonalds
2 wendys kfc NaN wendys -> kfc
3 taco bell kfc wendys taco bell -> kfc -> wendys
4 popeyes kfc panda express popeyes -> kfc -> panda express
