I have df that contains purchase data. Each row has a different item in the item column, but naturally the purchase id's in purchaseId stays the same across the same purchase. How can I insert a row for each purchaseId that contains the value in the age column as seen in df2 but in the item column?
df
purchaseId item age
22 apples 35
22 coffee 35
22 wipes 35
53 tomatoes 23
53 sugar 23
53 tea 23
df2
purchaseId item age
22 apples 35
22 coffee 35
22 wipes 35
22 35 35
53 tomatoes 23
53 sugar 23
53 tea 23
53 23 23
CodePudding user response:
Use concat with new DataFrame created by DataFrame.drop_duplicates and last sorting index by DataFrame.sort_index:
#necesary default index
#df = df.reset_index(drop=True)
df2 = (pd.concat([df,
df.drop_duplicates('purchaseId', keep='last')
.assign(item = lambda x: x['age'])])
.sort_index(kind='mergesort', ignore_index=True))
print (df2)
purchaseId item age
0 22 apples 35
1 22 coffee 35
2 22 wipes 35
3 22 35 35
4 53 tomatoes 23
5 53 sugar 23
6 53 tea 23
7 53 23 23
CodePudding user response:
import pandas as pd
df = pd.DataFrame({'purchaseID':[22,22,22,53,53,53],
'item':['apple','coffe','wipes',
'tomatoes','sugar','tea'],
'age':[35,35,35,23,23,23]})
df = df.append(df.groupby(['purchaseID','age'], as_index=False)['age'].max(), ignore_index=True).sort_values(by='purchaseID')
df['item'].fillna(df['age'], inplace=True)
