I have 2 dataframes for which I want to do a comparison. Please find the information below and appreciate any help on this.
df 1 shows the relationship between the IDs
df1 =
IDA IDB Relationship
A100 A200 Parent
A200 A500 Spouse
A111 A112 Child
A112 A111 Parent
df2 contains a list of IDs that I will check against df1 if there's any form of relationship between the ID's Party 1 and Party 2 and map the relationship over (the first instance in the event there's multiple relationships)
df2 =
Sender Receiver
[A900,A200] [A500,A220]
[A150,A100] [A400]
[A400,A112] [A500]
[A700,A112] [A111,A001]
Here's my expected output with explanations
Output =
Sender Receiver Relationship
[A900,A200] [A500,A220] Spouse #A200 and A500
[A150,A100] [A400] NAN #No match
[A400,A112] [A500] NAN #No match
[A700,A112] [A111,A001] Parent #A112 and A111
CodePudding user response:
I wasn't able to test it as you did not provide a data sample but something like that should work:
Output = df2.copy()
detected_relations = []
for transaction in df2.iterrows:
Receiver = transaction.Receiver
Sender = transaction.Sender
df = df1[(df1.IDA.isin(Sender) & df1.IDB.isin(Receiver)) | (df1.IDB.isin(Sender) & df1.IDA.isin(Receiver))]
detected_relations = detected_relations df.Relationship
Outpout["Relationship"] = detected_relations
CodePudding user response:
You could extract the information into native python data structures and then merge it back with your original DataFrames -
To do this - I would first make pairs out of the Sender and Receiver columns in df2 -
def make_pairs(row):
senders = row['Sender'].replace("[", "").replace("]", "").split(",")
receivers = row['Receiver'].replace("[", "").replace("]", "").split(",")
pairs = [(s, r) for s in senders for r in receivers]
return pairs
send_receive_combinations = df2.apply(make_pairs, axis=1).to_dict()
Then map the combination of IDA and IDB from df1 into a dictionary:
rels = {(ida, idb): rel for ida, idb, rel in df1.values}
A dict comprehension (or even a simple for loop) can then be used to subset values of interest
rel_pairs = {key: rels[pair] for key, combination in send_receive_combinations.items() for pair in combination if pair in rels}
And finally, we can merge this dict with df2 -
df2['relationship'] = df2.index
df2['relationship'] = df2['relationship'].map(rel_pairs)
print(df2)
Sender Receiver relationship
#0 [A900,A200] [A500,A220] Spouse
#1 [A150,A100] [A400] NaN
#2 [A400,A112] [A500] NaN
#3 [A700,A112] [A111,A001] Parent
