I have a dataframe like as below
df1 = pd.DataFrame({'region': ['ASEAN','ANZ','INDIA','KOREA'],
'first_contact':['user1','user2','user3','user4'],'first_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'second_contact':['user11','user21','user31','user41'],'second_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'third_contact':['user111','user211','user311','user411'],'third_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'fourth_contact':['user1111','user2111','user3111','user4111'],'fourth_contact_email':['[email protected]','[email protected]','[email protected]','[email protected]'],
'fifth_contact':['user11111','user21111','user31111','user41111'],'fifth_contact_email':['[email protected]','[email protected]',np.nan,np.nan],
'sixth_contact':['user111111','user211111','user311111','user411111'],'sixth_contact_email':['[email protected]','[email protected]',np.nan,np.nan]})
I would like to do the below
a) iterate through each row
b) get all email_id for that row
c) append it to a list
So, I tried the below
region_list = ['ASEAN','INDIA','KOREA','ANZ']
for region in region_list:
email_list=[]
temp_email_df = df1[df1['Region']==region]
email_list.append(temp_email_df.iloc[0,2:,2].to_string(header=False, index=False))
print(email_list)
--send email code follows here
for each region, I would like to get their list of email ids and use them to send an email.
But now the issue is in extracting all the emails.
How can I do to extract email-ids from all the email columns using iloc or positions etc
I expect my output to print email_list for each region
CodePudding user response:
You can use isin method to filter the regions based on the list you have, then filter the columns that has email in the name, finally stack it which will essentially remove NaN values, then call to_list() to create list out of it:
>>> df1[df1['region'].isin(region_list)].filter(like='email').stack().to_list()
['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
But if you need to get the email list by region, you can group the dataframe by regions after filtering the regions of interest then follow the rest of logic
(df1[df1['region'].isin(region_list)]
.set_index('region')
.filter(like='email')
.groupby(level=0)
.apply(lambda x: x.stack().tolist())
.reset_index())
region 0
0 ANZ [[email protected], [email protected], user211@gm...
1 ASEAN [[email protected], [email protected], user111@gm...
2 INDIA [[email protected], [email protected], user311@gm...
3 KOREA [[email protected], [email protected], user411@gm...
CodePudding user response:
Try as follows.
- Use
df.meltto unpivotdf1from wide to long format for allcolumnsthat contain the substring "email". - Use
df.dropnawithvalue(column that now contains all "emails") as the subset to get rid ofNaNvalues. - Finally, use
df.groupbyto group "emails" per region, and thenapply(list)to group the individual values in a list. - Result will be a
pd.Serieswith the regions asindex. So, to access the specific lists, you use, e.g.,emails['ANZ'], and then first email:emails['ANZ'][0](i.e. "[email protected]").
emails = df1.melt(id_vars='region',
value_vars=[col for col in df1.columns if 'email' in col]
).dropna(subset='value').groupby('region')['value'].apply(list)
print(emails)
region
ANZ [[email protected], [email protected], user211@gm...
ASEAN [[email protected], [email protected], user111@gm...
INDIA [[email protected], [email protected], user311@gm...
KOREA [[email protected], [email protected], user411@gm...
Name: value, dtype: object
# e.g. for `ANZ`:
print(emails['ANZ'])
['[email protected]', '[email protected]', '[email protected]',
'[email protected]', '[email protected]', '[email protected]']
