Check if each column values exist in another dataframe column where another column value is the colu-CodePudding

companies.xlsx

    company     To
1   amazon      [email protected]
2   google      [email protected] 
3   amazon      [email protected]
4   starbucks   [email protected]
5   greyhound   [email protected]

emails.xlsx

   [email protected]   [email protected]    [email protected]   ...
1  amazon       google          microsoft
2  starbucks    amazon          tesla
3  Grey Hound   greyhound       
4  ferrari

So i have the 2 excel sheets above and read both of em:

file1 = pd.ExcelFile('data/companies.xlsx')
file2 = pd.ExcelFile('data/emails.xlsx')

df_companies = file1.parse('sheet1')
df_emails = file2.parse('sheet1')

what i'm trying to accomplish is:

check if df_companies['To'] is an existing header in df_emails
if the header exists in df_emails, search the appropriate column of that header for df_companies['company']
if the company is found, add a column to df_companies and fill in '1', if not fill in '0'

e.g.: company amazon has the To email [email protected] in company.xlsx. in email.xlsx the header [email protected] exists and also amazon was found in the column - so its a '1'.

Anyone knows how to accomplish this?

CodePudding user response：

Here's one approach. Convert df_emails to a dictionary and map it to df_companies. Then, compare the mapped column with df_companies['company'].

df_companies['check'] = df_companies['To'].map(df_emails.to_dict(orient='list')).fillna('')
df_companies['check'] = df_companies.apply(lambda x: x['company'] in x['check'], axis=1).astype(int)

     company            To  check
1     amazon    [email protected]      1
2     google  [email protected]      1
3     amazon    [email protected]      0
4  starbucks    [email protected]      1
5  greyhound    [email protected]      0