How to compare different columns from two Dataframes in Python-CodePudding

I have 2 dataframes for which I want to do a comparison. Please find the information below and appreciate any help on this.

df 1 shows the relationship between the IDs

df1 = 

IDA   IDB   Relationship
A100  A200   Parent
A200  A500   Spouse
A111  A112   Child
A112  A111   Parent

df2 contains a list of IDs that I will check against df1 if there's any form of relationship between the ID's Party 1 and Party 2 and map the relationship over (the first instance in the event there's multiple relationships)

df2 = 

Sender      Receiver
[A900,A200] [A500,A220]
[A150,A100] [A400]
[A400,A112] [A500]
[A700,A112] [A111,A001]

Here's my expected output with explanations

Output =
 
Sender      Receiver     Relationship
[A900,A200] [A500,A220]  Spouse         #A200 and A500 
[A150,A100] [A400]       NAN            #No match
[A400,A112] [A500]       NAN            #No match
[A700,A112] [A111,A001]  Parent         #A112 and A111

CodePudding user response：

I wasn't able to test it as you did not provide a data sample but something like that should work:

Output = df2.copy()
detected_relations = []

for transaction in df2.iterrows:
    Receiver = transaction.Receiver
    Sender = transaction.Sender
    
    df = df1[(df1.IDA.isin(Sender) & df1.IDB.isin(Receiver)) | (df1.IDB.isin(Sender) & df1.IDA.isin(Receiver))]
    
    detected_relations = detected_relations   df.Relationship
    
Outpout["Relationship"] = detected_relations

CodePudding user response：

You could extract the information into native python data structures and then merge it back with your original DataFrames -

To do this - I would first make pairs out of the Sender and Receiver columns in df2 -

def make_pairs(row):
    senders = row['Sender'].replace("[", "").replace("]", "").split(",")
    receivers = row['Receiver'].replace("[", "").replace("]", "").split(",")
    pairs = [(s, r) for s in senders for r in receivers]
    return pairs
send_receive_combinations = df2.apply(make_pairs, axis=1).to_dict()

Then map the combination of IDA and IDB from df1 into a dictionary:

rels = {(ida, idb): rel for ida, idb, rel in df1.values}

A dict comprehension (or even a simple for loop) can then be used to subset values of interest

rel_pairs = {key: rels[pair] for key, combination in send_receive_combinations.items() for pair in combination if pair in rels}

And finally, we can merge this dict with df2 -

df2['relationship'] = df2.index
df2['relationship'] = df2['relationship'].map(rel_pairs)
print(df2)
    Sender     Receiver relationship
#0  [A900,A200]  [A500,A220]       Spouse
#1  [A150,A100]       [A400]          NaN
#2  [A400,A112]       [A500]          NaN
#3  [A700,A112]  [A111,A001]       Parent