Pandas Return values based on a range of data with condition-CodePudding

I am working using python and pandas with the dataset below

Name	Subject	Grade
Alex	Science	A
Bob	Maths	B
Bob	Maths	C
Cynthia	Science	C
Dylan	Geography	A
Dylan	Geography	A

I would like to find names and subjects that have been repeated twice and display them exactly like the table below. The grades will only get concatenated by a comma if required.

Name	Subject	Grade
Bob	Maths	B,C
Dylan	Geography	A

Any help would be greatly appreciated

CodePudding user response：

You could find your duplicated rows based on Name and Subject columns using duplicated, and then use groupby with a custom join.

set is used to ensure a single Grade is returned when Grades are the same.

key_cols = ['Name','Subject']
df[df.duplicated(key_cols, keep=False)].groupby(key_cols,as_index=False).agg({'Grade':lambda x: ','.join(set(x))})

prints:

    Name    Subject Grade
0    Bob      Maths   C,B
1  Dylan  Geography     A