I have two xlsx files as follows:
value1 value2
3900162750 10
3900163003 19
2311009200 22
value1 value2
3900163003 5
3900162750 9
2311009200 88
How do I match value1 from xlsx1 with xlsx2 and compare related value2?
for instance, match 3900163003 from xlsx1 with xlsx2 and find out value2 is decreasing.
CodePudding user response:
You can use pandas. First read the excel files as pandas.DataFrame objects. Then using pd.DataFrame.merge method, merge the two dataframes on 'value1' column. Then create another column 'Decreasing' by comparing the two 'value2' columns. By default, the column from the left (in this case, df1) gets suffix _x and the one from the right (in this case, df2) gets suffix _y:
import pandas as pd
df1 = pd.read_excel('first_excel_file.xlsx')
df2 = pd.read_excel('second_excel_file.xlsx')
df = df1.merge(df2, on='value1')
df['Decreasing'] = df['value2_x'] > df['value2_y']
Output:
value1 value2_x value2_y Decreasing
0 3900162750 10 9 True
1 3900163003 19 5 True
2 2311009200 22 88 False
CodePudding user response:
Use merge :
df = df1.merge(df2, on='value1', how= 'outer', suffixes=('_df1', '_df2'))
df['decrease'] = df['value2_df2'] < df['value2_df1']
giving:
value1 value2_df1 value2_df2 decrease
0 3900162750 10 9 True
1 3900163003 19 5 True
2 2311009200 22 88 False
Notes:
howparameter used to provide some output ifdf1anddf2are not perfectly matchingsuffixesparameter used to provide the origin of the data- it could be suitable to give a other name to the
'decrease'column to remember the way it is computed
