Home > Enterprise >  Compare two excel files using python
Compare two excel files using python

Time:01-05

I have two xlsx files as follows:

value1        value2   
3900162750    10    
3900163003    19
2311009200    22 

value1        value2   
3900163003    5    
3900162750    9
2311009200    88

How do I match value1 from xlsx1 with xlsx2 and compare related value2?

for instance, match 3900163003 from xlsx1 with xlsx2 and find out value2 is decreasing.

CodePudding user response:

You can use pandas. First read the excel files as pandas.DataFrame objects. Then using pd.DataFrame.merge method, merge the two dataframes on 'value1' column. Then create another column 'Decreasing' by comparing the two 'value2' columns. By default, the column from the left (in this case, df1) gets suffix _x and the one from the right (in this case, df2) gets suffix _y:

import pandas as pd
df1 = pd.read_excel('first_excel_file.xlsx')
df2 = pd.read_excel('second_excel_file.xlsx')
df = df1.merge(df2, on='value1')
df['Decreasing'] = df['value2_x'] > df['value2_y']

Output:

       value1  value2_x  value2_y  Decreasing
0  3900162750        10         9        True
1  3900163003        19         5        True
2  2311009200        22        88       False

CodePudding user response:

Use merge :

df = df1.merge(df2, on='value1', how= 'outer', suffixes=('_df1', '_df2'))
df['decrease'] = df['value2_df2'] < df['value2_df1']

giving:

       value1  value2_df1  value2_df2  decrease
0  3900162750          10           9      True
1  3900163003          19           5      True
2  2311009200          22          88     False

Notes:

  • how parameter used to provide some output if df1 and df2 are not perfectly matching
  • suffixes parameter used to provide the origin of the data
  • it could be suitable to give a other name to the 'decrease' column to remember the way it is computed
  •  Tags:  
  • Related