Home > Enterprise >  Check if each column values exist in another dataframe column where another column value is the colu
Check if each column values exist in another dataframe column where another column value is the colu

Time:01-25

companies.xlsx

    company     To
1   amazon      [email protected]
2   google      [email protected] 
3   amazon      [email protected]
4   starbucks   [email protected]
5   greyhound   [email protected]

emails.xlsx

   [email protected]   [email protected]    [email protected]   ...
1  amazon       google          microsoft
2  starbucks    amazon          tesla
3  Grey Hound   greyhound       
4  ferrari

So i have the 2 excel sheets above and read both of em:

file1 = pd.ExcelFile('data/companies.xlsx')
file2 = pd.ExcelFile('data/emails.xlsx')

df_companies = file1.parse('sheet1')
df_emails = file2.parse('sheet1')

what i'm trying to accomplish is:

  1. check if df_companies['To'] is an existing header in df_emails
  2. if the header exists in df_emails, search the appropriate column of that header for df_companies['company']
  3. if the company is found, add a column to df_companies and fill in '1', if not fill in '0'

e.g.: company amazon has the To email [email protected] in company.xlsx. in email.xlsx the header [email protected] exists and also amazon was found in the column - so its a '1'.

Anyone knows how to accomplish this?

CodePudding user response:

Here's one approach. Convert df_emails to a dictionary and map it to df_companies. Then, compare the mapped column with df_companies['company'].

df_companies['check'] = df_companies['To'].map(df_emails.to_dict(orient='list')).fillna('')
df_companies['check'] = df_companies.apply(lambda x: x['company'] in x['check'], axis=1).astype(int)

     company            To  check
1     amazon    [email protected]      1
2     google  [email protected]      1
3     amazon    [email protected]      0
4  starbucks    [email protected]      1
5  greyhound    [email protected]      0
  •  Tags:  
  • Related