Home > Software engineering >  An elegant way to update select cells based with column name
An elegant way to update select cells based with column name

Time:01-23

I am looking for an elegant way to achieve updating the dataframe as per below. It contains the answers of a test per student (rows). This data is acquired by reading in an XLSX. The way answers have been recorded changed and I need to revert back to the original format for BI purposes. Different year levels have a different number of columns/questions.

In this df, the correct answers are in the header row, i.e. the Q1 correct answer is "C" (column 3), Q2 column 4 etc.

I need to replace the ticks in the DF with the correct answer as shown in the header for each question/column. I have written a horrible set of loops to do do a quick and dirty but I expect there is a much more elegant way to achieve this with Pandas.

enter image description here

. This dataframe is imported

CodePudding user response:

Use pd.where to mask correct values and then replace what you do not need using columns converted to series

df.where(df != '✓', df.columns.to_series(), axis=1)

Outcome

    Name  A  B  D
0  Stud1  A  C  D
1  Stud2  A  B  D
2  Stud3  B  B  A
3  Stud4  C  B  C

CodePudding user response:

Convert columns names to pd.Series to use as a mapper and use replace to replace each tick mark '✓' with its corresponding column name:

df = df.replace('✓', pd.Series(df.columns, df.columns))

Example: For DataFrame df:

    Name  A  B  D
0  Stud1  ✓  C  ✓
1  Stud2  ✓  ✓  ✓
2  Stud3  B  ✓  A
3  Stud4  C  ✓  C

the above code produces:

    Name  A  B  D
0  Stud1  A  C  D
1  Stud2  A  B  D
2  Stud3  B  B  A
3  Stud4  C  B  C
  •  Tags:  
  • Related