companies.xlsx
company To
1 amazon [email protected]
2 google [email protected]
3 amazon [email protected]
4 starbucks [email protected]
5 greyhound [email protected]
emails.xlsx
[email protected] [email protected] [email protected] ...
1 amazon google microsoft
2 starbucks amazon tesla
3 Grey Hound greyhound
4 ferrari
So i have the 2 excel sheets above and read both of em:
file1 = pd.ExcelFile('data/companies.xlsx')
file2 = pd.ExcelFile('data/emails.xlsx')
df_companies = file1.parse('sheet1')
df_emails = file2.parse('sheet1')
what i'm trying to accomplish is:
- check if df_companies['To'] is an existing header in df_emails
- if the header exists in df_emails, search the appropriate column of that header for df_companies['company']
- if the company is found, add a column to df_companies and fill in '1', if not fill in '0'
e.g.: company amazon has the To email [email protected] in company.xlsx. in email.xlsx the header [email protected] exists and also amazon was found in the column - so its a '1'.
Anyone knows how to accomplish this?
CodePudding user response:
Here's one approach. Convert df_emails to a dictionary and map it to df_companies. Then, compare the mapped column with df_companies['company'].
df_companies['check'] = df_companies['To'].map(df_emails.to_dict(orient='list')).fillna('')
df_companies['check'] = df_companies.apply(lambda x: x['company'] in x['check'], axis=1).astype(int)
company To check
1 amazon [email protected] 1
2 google [email protected] 1
3 amazon [email protected] 0
4 starbucks [email protected] 1
5 greyhound [email protected] 0
