I'm still pretty new to pandas, and can't quite wrap my head around this one. I have a table that looks like this:
| Timestamp | Sport | Place |
|---|---|---|
| 10/1/2022 12:05:00 | basketball | place1 |
| 10/1/2022 4:05:00 | basketball | place1 |
| 10/2/2022 16:05 | basketball | place1 |
| 10/2/2022 9:00 | basketball | place2 |
| 10/2/2022 11:00 | football | place3 |
| 10/2/2022 11:15 | soccer | place1 |
| 10/3/2022 11:15 | baseball | place4 |
| 10/3/2022 11:20 | tennis | place5 |
| 10/3/2022 11:30 | tennis | place5 |
And I'd like to read have a python script that uses pandas to produce this:
| Date | Sport | Place |
|---|---|---|
| 1-Oct | 2x basketball | place1 |
| 2-Oct | 2x basketball, 1x soccer | place1 |
| 1x basketball | place2 | |
| 1x football | place3 | |
| 3-Oct | 1x baseball | place4 |
| 2x tennis | place5 |
I can't quite figure out how to do the groupby correctly and then to concatenate the columns.
(And i'm going to want to do this for many similar tables, so i want to use a script.)
This gets me very close:
val_cnts = df.groupby(['date', 'Place'])['Sport'].value_counts()
But this returns a multi-indexed series of just the sport counts, so it looks like I need to turn some of the indices into columns, and then do some concatenations. When I do a .reset_index() on the Place and Sport indices though, that still doesn't work because the groupings don't seem right anymore.
CodePudding user response:
Try:
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
df["Date"] = df["Timestamp"].dt.strftime("%-d-%b")
df = (
df.groupby(["Date", "Place"])["Sport"]
.value_counts()
.to_frame(name="x")
.reset_index(level=2)
)
df["Sport"] = df["x"].astype(str) "x " df["Sport"]
df = df.groupby(level=[0, 1])["Sport"].agg(", ".join).reset_index()
print(df)
Prints:
Date Place Sport
0 1-Oct place1 2x basketball
1 2-Oct place1 1x basketball, 1x soccer
2 2-Oct place2 1x basketball
3 2-Oct place3 1x football
4 3-Oct place4 1x baseball
5 3-Oct place5 2x tennis
