I have two dataframes.
One is music.
| name | Date | Edition | Song_ID | Singer_ID |
|---|---|---|---|---|
| LA | 01.05.2009 | 1 | 1 | 1 |
| Second | 13.07.2009 | 1 | 2 | 2 |
| Mexico | 13.07.2009 | 1 | 3 | 1 |
| Let's go | 13.09.2009 | 1 | 4 | 3 |
| Hello | 18.09.2009 | 1 | 5 | (4,5) |
| Don't give up | 12.02.2010 | 2 | 6 | (5,6) |
| ZIC ZAC | 18.03.2010 | 2 | 7 | 7 |
| Blablabla | 14.04.2010 | 2 | 8 | 2 |
| Oh la la | 14.05.2011 | 3 | 9 | 4 |
| Food First | 14.05.2011 | 3 | 10 | 5 |
| La Vie est.. | 17.06.2011 | 3 | 11 | 8 |
| Jajajajajaja | 13.07.2011 | 3 | 12 | 9 |
And another dataframe called singer
| Singer | nationality | Singer_ID |
|---|---|---|
| JT Watson | USA | 1 |
| Rafinha | Brazil | 2 |
| Juan Casa | Spain | 3 |
| Kidi | USA | 4 |
| Dede | USA | 5 |
| Briana | USA | 6 |
| Jay Ado | UK | 7 |
| Dani | Australia | 8 |
| Mike Rich | USA | 9 |
I would like to know, which Edition has the most Singers from USA involved, but the information are in two different dataframes.
What I done so far is that
singer['nationality'].value_counts()['USA']
But this only shows that 5 singers are from USA. I have a column which is in both dataframes the same, called Singer_ID.
CodePudding user response:
You need to merge the two dataframes on the key shared https://pandas.pydata.org/docs/reference/api/pandas.merge.html
merged = singer.merge(music,on="Singer_ID")
merged['nationality'].value_counts()['USA']
editions = merged.groupby("Edition")
# or print(merged.groupby(["Edition", "nationality"])["nationality"].count())
max_value = 0
best_edition = 0
for edition, df in editions:
nbr_usa = df["nationality"].value_counts()["USA"]
if nbr_usa > max_value:
best_edition = edition
max_value = nbr_usa
