Home > database >  How can I use a value in a dataframe to look up an attribute
How can I use a value in a dataframe to look up an attribute

Time:01-06

Say I have the 2 Dataframes below; one with a list of students and test scores, and different student sessions that made up of the students. Say I want to add a new column, "Sum", to df with the sum of the scores for each session and a new column for the number of years passed since the most recent year that either student took the test, "Years Elapsed". What is the best way to accomplish this? I can make the students a class and make each student an object but then I am stuck on how to link the object to their name in the dataframe.

data1 = {'Student': ['John','Kim','Adam','Sonia'],
         'Score': [92,100,76,82],
         'Year': [2015,2013,2016,2018]}
 
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])

data2 = {'Session': [1,2,3,4],
         'Student1': ['Sonia','Kim','John','Adam'],
         'Student2': ['Adam','Sonia','Kim','John']}

df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])

The desired outcome:

outcome = {'Session': [1,2,3,4],
           'Student1': ['Sonia','Kim','John','Adam'],
           'Student2': ['Adam','Sonia','Kim','John'],
           'Sum': [158, 182, 192, 168],
           'Years Elapsed': [4,4,7,6]}

df_outcome = pd.DataFrame(outcome, columns=['Session','Student1','Student2','Sum','Years Elasped'])

I have made a class called Student and made each student an object but after this is where I am stuck.

df_students.columns = df_students.columns.str.lower()

class Student:
    def __init__(self, s, sc, yr):
        self.student = s
        self.score = sc
        self.year = yr

students = [Student(row.student, row.score, row.year) for index, row in df_students.iterrows()]

#check to see if list of objects was created correctly
s1 = students[1] 
s1.__dict__

Thanks in advance!

CodePudding user response:

Using apply method:

import pandas as pd

data1 = {'Student': ['John','Kim','Adam','Sonia'],
         'Score': [92,100,76,82],
         'Year': [2015,2013,2016,2018]}
 
df_students = pd.DataFrame(data1, columns=['Student','Score','Year'])

data2 = {'Session': [1,2,3,4],
         'Student1': ['Sonia','Kim','John','Adam'],
         'Student2': ['Adam','Sonia','Kim','John']}

df = pd.DataFrame(data2, columns=['Session','Student1','Student2'])

# SOLUTION 
def sum_scores(student1, student2):
    _score_s1 = df_students.loc[(df_students['Student']==student1)]['Score'].values[0]
    _score_s2 = df_students.loc[(df_students['Student']==student2)]['Score'].values[0]
    return  _score_s1   _score_s2

def years_elapsed(student1, student2):
    _year = pd.to_datetime("today").year
    _year_s1 = df_students.loc[(df_students['Student']==student1)]['Year'].values[0]
    _year_s2 = df_students.loc[(df_students['Student']==student2)]['Year'].values[0]
    return _year - max(_year_s1, _year_s2)

df['sum_score'] = df.apply(lambda row: sum_scores(row['Student1'], row['Student2']), axis=1)
df['years_elapsed'] = df.apply(lambda row: years_elapsed(row['Student1'], row['Student2']), axis=1)

df

enter image description here

CodePudding user response:

You can try this:

df2 = pd.merge(df, df_students,  left_on="Student1", right_on="Student")
df3 = pd.merge(df2, df_students,  left_on="Student2", right_on="Student")
df3['Sum'] = df3[['Score_x','Score_y']].sum(axis=1)
df3['Years Elapsed'] = 2022 - df3[['Year_x', 'Year_y']].max(axis=1)
df3 = df3[['Session', 'Student1', 'Student2', 'Sum', 'Years Elapsed']]
print(df3)

It gives:

   Session Student1 Student2  Sum  Years Elapsed
0        1    Sonia     Adam  158              4
1        2      Kim    Sonia  182              4
2        3     John      Kim  192              7
3        4     Adam     John  168              6
  •  Tags:  
  • Related