I got multiple excel file which needs to be need if OLD data is matching NEW data. Normally I use dataframe.equals but since the NEW data is containing additional columns this doesn't work anymore.
Very excel file contains two tabs with OLD and NEW data. I have to check if the OLD data is matching per record in NEW. The issue of NEW is that columns aren't in the same order, additional columns plus record aren't in the same order.
The code I normally use to check but it is giving
import os
import pandas as pd
TargetFolder = os.listdir('Dir')
for file in TargetFolder:
df = pd.ExcelFile('TargetFolder' file)
dfPrep = pd.read_excel(df, 'OLD')
dfCE = pd.read_excel(df, 'NEW')
Checkdf = dfPrep.equals(dfCE)
CodePudding user response:
IIUC, you can use pandas.DataFrame.loc to select/pick the exact OLD columns from the NEW ones then use pandas.DataFrame.sort_values to reorder the rows by the two columns Column4 and Column8.
Try this :
from pathlib import Path
import pandas as pd
a_directory= "path_to_the_folder_containing_the_excel_files"
for file in Path(a_directory).glob("*.xlsx"):
dfPrep = pd.read_excel(file, sheet_name="OLD")
dfCE = pd.read_excel(file, sheet_name="NEW")
dfCE_adapted= (
dfCE.loc[:, dfPrep.columns]
.sort_values(by=["Column4", "Column8"], ignore_index=True)
)
Checkdf= dfPrep.equals(dfCE_adapted)
if Checkdf:
print(file.stem)
else:
pass
If the two sheets OLD and NEW matches, then the Excel filename will be printed.
