I am working using python and pandas with the dataset below
| Name | Subject | Grade |
|---|---|---|
| Alex | Science | A |
| Bob | Maths | B |
| Bob | Maths | C |
| Cynthia | Science | C |
| Dylan | Geography | A |
| Dylan | Geography | A |
I would like to find names and subjects that have been repeated twice and display them exactly like the table below. The grades will only get concatenated by a comma if required.
| Name | Subject | Grade |
|---|---|---|
| Bob | Maths | B,C |
| Dylan | Geography | A |
Any help would be greatly appreciated
CodePudding user response:
You could find your duplicated rows based on Name and Subject columns using duplicated, and then use groupby with a custom join.
set is used to ensure a single Grade is returned when Grades are the same.
key_cols = ['Name','Subject']
df[df.duplicated(key_cols, keep=False)].groupby(key_cols,as_index=False).agg({'Grade':lambda x: ','.join(set(x))})
prints:
Name Subject Grade
0 Bob Maths C,B
1 Dylan Geography A
