I have a two lists of panda dataframes as follows,
import pandas as pd
import numpy as np
list_one = [pd.DataFrame({'sent_a.1': [0, 3, 2, 1], 'sent_a.2': [0, 1, 4, 0], 'sent_b.3': [0, 6, 0, 8],'sent_b.4': [1, 1, 8, 6],'ID':['id_1','id_1','id_1','id_1']}),
pd.DataFrame({'sent_a.1': [0, 3], 'sent_a.2': [0, 2], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']})]
list_two = [pd.DataFrame({'sent_a.1': [0, 5], 'sent_a.2': [0, 1], 'sent_b.3': [0, 6],'sent_b.4': [1, 1],'ID':['id_2','id_2']}),
pd.DataFrame({'sent_a.1': [0, 5, 3, 1], 'sent_a.2': [0, 2, 3, 1], 'sent_b.3': [0, 6, 6, 8],'sent_b.4': [1, 5, 8, 5],'ID':['id_1','id_1','id_1','id_1']})]
I would like to compare the dataframes in these two lists and if the values are the same, I would like to replace the value with 'True' and if the values are different, I would like to set them to 'False' and save the result in a different list of panda dataframes. I have done the following,
for dfs in list_one:
for dfs2 in list_two:
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
but I get the error,
ValueError: Can only compare identically-labeled DataFrame objects
how can I sort values in these two lists, based on the values from column 'ID'?
Edit I would like the dataframes that have the same value for column 'ID' to be compared. meaning that dataframes that have 'ID' == 'id_1' are to be compared with one another and dataframes that have 'ID' == 'id_2' to be compared with each other (not a cross comparison)
so the desired output is:
output = [ sent_a.1 sent_a.2 sent_b.3 sent_b.4 ID
0 True True True True id_1
1 False False True False id_1
2 False False False True id_1
3 False False True True id_1,
sent_a.1 sent_a.2 sent_b.3 sent_b.4 ID
0 True True True True id_2
1 True True False False id_2]
CodePudding user response:
Based on your current example
For your first question:
how can I sort values in these two lists, based on the values from column 'ID'?
list_one = sorted(list_one,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
list_two =sorted(list_two,key=lambda x: x['ID'].unique()[0][3:], reverse=False)
ValueError: Can only compare identically-labeled DataFrame objects
- error due to different index values order in dataframes or dataframes are of different shapes
First way of comparison:
for dfs in list_one:
for dfs2 in list_two:
if dfs.shape == dfs2.shape:
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
Second way:
I would like the dataframes that have the same value for column 'ID' to be compared
for dfs in list_one:
for dfs2 in list_two:
if (dfs['ID'].unique() == dfs2['ID'].unique()) and (dfs.shape == dfs2.shape):
g = np.where(dfs == dfs2, 'True', 'False')
print (g)
