How to find the desired values between two dataframes in python-CodePudding

I have two dataframes.

One is music.

name	Date	Edition	Song_ID	Singer_ID
LA	01.05.2009	1	1	1
Second	13.07.2009	1	2	2
Mexico	13.07.2009	1	3	1
Let's go	13.09.2009	1	4	3
Hello	18.09.2009	1	5	(4,5)
Don't give up	12.02.2010	2	6	(5,6)
ZIC ZAC	18.03.2010	2	7	7
Blablabla	14.04.2010	2	8	2
Oh la la	14.05.2011	3	9	4
Food First	14.05.2011	3	10	5
La Vie est..	17.06.2011	3	11	8
Jajajajajaja	13.07.2011	3	12	9

And another dataframe called singer

Singer	nationality	Singer_ID
JT Watson	USA	1
Rafinha	Brazil	2
Juan Casa	Spain	3
Kidi	USA	4
Dede	USA	5
Briana	USA	6
Jay Ado	UK	7
Dani	Australia	8
Mike Rich	USA	9

I would like to know, which Edition has the most Singers from USA involved, but the information are in two different dataframes.

What I done so far is that

singer['nationality'].value_counts()['USA']

But this only shows that 5 singers are from USA. I have a column which is in both dataframes the same, called Singer_ID.

CodePudding user response：

You need to merge the two dataframes on the key shared https://pandas.pydata.org/docs/reference/api/pandas.merge.html

merged = singer.merge(music,on="Singer_ID")
merged['nationality'].value_counts()['USA']



editions = merged.groupby("Edition")
# or print(merged.groupby(["Edition", "nationality"])["nationality"].count())
max_value = 0
best_edition = 0
for edition, df in editions:
    nbr_usa = df["nationality"].value_counts()["USA"]
    if nbr_usa > max_value:
        best_edition = edition 
        max_value = nbr_usa