Say I have the 2 Dataframes below; one with a list of students and test scores, and different student sessions that made up of the students. Say I want to add a new column, "Sum", to df with the sum of the scores for each session and a new column for the number of years passed since the most recent year that either student took the test, "Years Elapsed". What is the best way to accomplish this? I can make the students a class and make each student an object but then I am stuck on how to link the object to their name in the dataframe.
data1 = {'Student': ['John','Kim','Adam','Sonia'],
'Score': [92,100,76,82],
'Year': [2015,2013,2016,2018]}
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])
data2 = {'Session': [1,2,3,4],
'Student1': ['Sonia','Kim','John','Adam'],
'Student2': ['Adam','Sonia','Kim','John']}
df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])
The desired outcome:
outcome = {'Session': [1,2,3,4],
'Student1': ['Sonia','Kim','John','Adam'],
'Student2': ['Adam','Sonia','Kim','John'],
'Sum': [158, 182, 192, 168],
'Years Elapsed': [4,4,7,6]}
df_outcome = pd.DataFrame(outcome, columns=['Session','Student1','Student2','Sum','Years Elasped'])
I have made a class called Student and made each student an object but after this is where I am stuck.
df_students.columns = df_students.columns.str.lower()
class Student:
def __init__(self, s, sc, yr):
self.student = s
self.score = sc
self.year = yr
students = [Student(row.student, row.score, row.year) for index, row in df_students.iterrows()]
#check to see if list of objects was created correctly
s1 = students[1]
s1.__dict__
Thanks in advance!
CodePudding user response:
Using apply method:
import pandas as pd
data1 = {'Student': ['John','Kim','Adam','Sonia'],
'Score': [92,100,76,82],
'Year': [2015,2013,2016,2018]}
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])
data2 = {'Session': [1,2,3,4],
'Student1': ['Sonia','Kim','John','Adam'],
'Student2': ['Adam','Sonia','Kim','John']}
df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])
# SOLUTION
def sum_scores(student1, student2):
_score_s1 = df_students.loc[(df_students['Student']==student1)]['Score'].values[0]
_score_s2 = df_students.loc[(df_students['Student']==student2)]['Score'].values[0]
return _score_s1 _score_s2
def years_elapsed(student1, student2):
_year = pd.to_datetime("today").year
_year_s1 = df_students.loc[(df_students['Student']==student1)]['Year'].values[0]
_year_s2 = df_students.loc[(df_students['Student']==student2)]['Year'].values[0]
return _year - max(_year_s1, _year_s2)
df['sum_score'] = df.apply(lambda row: sum_scores(row['Student1'], row['Student2']), axis=1)
df['years_elapsed'] = df.apply(lambda row: years_elapsed(row['Student1'], row['Student2']), axis=1)
df
CodePudding user response:
You can try this:
df2 = pd.merge(df, df_students, left_on="Student1", right_on="Student")
df3 = pd.merge(df2, df_students, left_on="Student2", right_on="Student")
df3['Sum'] = df3[['Score_x','Score_y']].sum(axis=1)
df3['Years Elapsed'] = 2022 - df3[['Year_x', 'Year_y']].max(axis=1)
df3 = df3[['Session', 'Student1', 'Student2', 'Sum', 'Years Elapsed']]
print(df3)
It gives:
Session Student1 Student2 Sum Years Elapsed
0 1 Sonia Adam 158 4
1 2 Kim Sonia 182 4
2 3 John Kim 192 7
3 4 Adam John 168 6

