I have customer transaction dataset like this:
| ID | Date | Amount |
|---|---|---|
| 1 | 1-1-21 | 5 |
| 2 | 2-1-21 | 8 |
| 1 | 2-1-21 | 6 |
| 1 | 3-1-21 | 5 |
| 2 | 3-1-21 | 9 |
| 2 | 3-1-21 | 10 |
I have to groupby and aggregate the data on customer level like this:
| ID | Total Amount | Number of days active |
|---|---|---|
| 1 | 16 | 3 |
| 2 | 27 | 2 |
Total Amount = sum of all Amount column
Number of days active = Number of days customer made 1 or more transactions
How do I calculate my column Number of days active? So far I have tried:
df= df.groupby('ID').agg({'Amount': lambda price: price.sum(),
'Date': lambda date: len(date).days})
My Total Amount column is fine but I cannot find the Number of days active
CodePudding user response:
Let us do groupby with agg : nunique sum
out = df.groupby('ID').agg(Numberofdaysactive = ('Date','nunique'),TotalAmount = ('Amount','sum')).reset_index()
out
Out[384]:
ID Numberofdaysactive TotalAmount
0 1 3 16
1 2 2 27
CodePudding user response:
nunique should be what you need. That is, the aggregate df can be calculated by:
df_agg = df.groupby('ID').agg({"Amount":sum, "Date":pd.Series.nunique})
Note how you can pass function handles directly to agg.
