Consider the Pandas DataFrame df1:
df1 = pd.DataFrame({"Name":["Kevin","Peter","James","Jose","Matthew","Pattrick","Alexander"],"Number":[1,2,3,4,5,6,7],"Total":[495.2,432.5,'-',395.5,485.8,415,418.7],"Average_old":[86.57,83.97,'-',96.59,84.67,83.10,83.84],"Grade_old":['A','A','A','A ','A','A','A'],"Total_old":[432.8,419.8,'-',482.9,423.3,415,418.7]})
I calculated the Average and Grade with the following formula
df1["Average"] = df1["Total"].apply(lambda x: x/5 0.1 if x != "-" else "-")
df1["Grade"] = df1["Average"].apply((lambda x:'A ' if x>90 else 'A'))
So df1 becomes
df1
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A
1 Peter 2 432.5 83.97 A 419.8 86.60 A
2 James 3 - - A - - A
3 Jose 4 395.5 96.59 A 482.9 79.20 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A
5 Pattrick 6 415.0 83.10 A 415.0 83.10 A
6 Alexander 7 418.7 83.84 A 418.7 83.84 A
df1 has Total, Total_old, Grade, Grade_old, Average, Average_old. I am trying to check if any value of Total is modified with respect to Total_old, any value of Grade is modified with respect to Grade_old or any value of Average is modified with respect to Average_old. I am trying to create a new Dataframe dfmod that would give all the modified values of df1 with the following code
dfmod = pd.DataFrame()
columns =["Total","Average","Grade"]
for col in columns:
dfmod = pd.concat([dfmod,df1[["Name","Number",col '_old']][df1[col].ne(df1[col '_old'])].dropna()],sort=False)
dfmod.rename(columns={col '_old':col},inplace=True)
dfmod = dfmod.groupby('Name',as_index = False,sort = False).first()
And got the output as
dfmod
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 None
2 Jose 4 482.9 96.59 A
3 Matthew 5 423.3 84.67 A
4 Alexander 7 NaN 83.84 None
Here no values of Pattrick was modified when comparing Total with Total_old, Average with Average_old, and Grade with Grade_old so Pattrick's entry was correctly dropped.
However if you observe Alexander's Average even though the Total, Average and Grade are same as Total_old,Average_old,Grade_old respectively , the modified value dataframe dfmod has the Average value wrongly added as a modified value. The reason why it happened is because floating point arithmetic wont work like integer arithmetic in programming languages as mentioned in the link below. https://www.geeksforgeeks.org/floating-point-error-in-python/
so I tried to implement np.isclose function as:
for col in columns:
if col is 'Grade':
dfmod = pd.concat([dfmod,df1[["Name","Number",col '_old']][df1[col].ne(df1[col '_old'])].dropna()],sort=False)
continue
dfmod = pd.concat([dfmod,df1[["Name","Number",col '_old']][~np.isclose(df1[col],df1[col '_old'])].dropna()],sort=False)
But it throws the error message as
`Exception has occurred: TypeError ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''`
The error seems to be because of the '-' character in the data how can I fix this issue, please do help, I am stuck at this issue for a while and tried all the resources I could get.
Expected output:
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 None
3 Jose 4 482.9 96.59 A
4 Matthew 5 423.3 84.67 A
It should omit the values for James,Pattrick and Alexander as they don't have any change for Total - Total_old, Average - Average_old, Grade - Grade_old
CodePudding user response:
Please have a look if this is what you are looking for.
import pandas as pd
import numpy as np
def compute_grade(new_average, old_grade):
try:
grade = 'A ' if float(new_average) > 90 else 'A'
except:
grade = old_grade
return grade
df1 = pd.DataFrame({"Name":["Kevin","Peter","James","Jose","Matthew","Pattrick","Alexander"],"Number":[1,2,3,4,5,6,7],"Total":[495.2,432.5,'-',395.5,485.8,415,418.7],"Average_old":[86.57,83.97,'-',96.59,84.67,83.10,83.84],"Grade_old":['A','A','A','A ','A','A','A'],"Total_old":[432.8,419.8,'-',482.9,423.3,415,418.7]})
df1["Average"] = df1["Total"].apply(lambda x: round((x/5) 0.1, 2) if x != "-" else "-")
df1["Grade"] = df1.apply((lambda x: compute_grade(x['Average'], x['Grade_old'])), axis=1)
print(df1)
# import pdb; pdb.set_trace()
dfmod = df1[(df1['Total'] != df1['Total_old']) | (df1['Average'] != df1['Average_old']) | (df1['Grade'] != df1['Grade_old'])]
print(dfmod)
Output:
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A
1 Peter 2 432.5 83.97 A 419.8 86.6 A
2 James 3 - - A - - A
3 Jose 4 395.5 96.59 A 482.9 79.2 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A
5 Pattrick 6 415 83.1 A 415 83.1 A
6 Alexander 7 418.7 83.84 A 418.7 83.84 A
Name Number Total Average_old Grade_old Total_old Average Grade
0 Kevin 1 495.2 86.57 A 432.8 99.14 A
1 Peter 2 432.5 83.97 A 419.8 86.6 A
3 Jose 4 395.5 96.59 A 482.9 79.2 A
4 Matthew 5 485.8 84.67 A 423.3 97.26 A
Rounding off the average we compute to 2 decimals was the key here.
Also when computing the Grade if we encounter non numeric value like '-', i was returning Grade_old.
CodePudding user response:
As far as I can tell, the "-" character is unnecessary – you can replace it with None in the columns where they show up, then make those columns numerical. This will make your preprocessing steps much cleaner, and avoid unnecessary conditional statements where we need to check whether certain values are "-".
df1 = df1.replace("-",None).astype({"Total":float, "Average_old":float, "Total_old":float})
Then you don't have to use .apply or check for whether a particular element is "-" when creating the new columns Average and Grade:
df1["Average"] = df1["Total"]/5 0.1
df1["Grade"] = ["A " if x>90 else "A" for x in df1["Average"]]
Then you can use np.isclose in your condition, drop any rows containing null, select the columns with "_old", and rename the columns:
condition = ~(np.isclose(df1['Average'], df1['Average_old']) & np.isclose(df1['Total'], df1['Total_old']) & (df1['Grade_old'] == df1['Grade']))
cols = ['Name','Number'] [col for col in df1.columns if "_old" in col]
df1.loc[condition, cols].dropna().rename(columns={'Total_old':'Total','Average_old':'Average','Grade_old':'Grade'})
Result:
Name Number Total Average Grade
0 Kevin 1 432.8 86.57 A
1 Peter 2 419.8 83.97 A
3 Jose 4 482.9 96.59 A
4 Matthew 5 423.3 84.67 A
