I have a dataframe and a list
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB02', 'AB03', 'AB03','AB03', 'AB04'],
'col.A': ["Yes",np.nan,'Yes','Yes',"Yes",np.nan, np.nan]
}
)
ids = ['AB01', 'AB02', 'AB03']
new_col = 'result'
met = 'yes'
else_met = 'no'
for every id in list, if they're in the dataframe, I want to set the new_col to value in met and else_met if they're not available.
I tried the following code and it's not working, what am I doing wrong?
df[new_col] = df['ID'].apply(lambda x: met if x in ids else else_met)
CodePudding user response:
You can use loc and isin to assign met, and fillna the result with else_met:
df.loc[df.ID.isin(ids),'new_col'] = 'met'
df['new_col'].fillna('else_met',inplace=True)
ID col.A new_col
0 AB01 Yes met
1 AB02 NaN met
2 AB02 Yes met
3 AB03 Yes met
4 AB03 Yes met
5 AB03 NaN met
6 AB04 NaN else_met
CodePudding user response:
df['new_col'] = 'else_met'
df.loc[ df.ID.isin(ids), 'new_col'] = 'met'
CodePudding user response:
You can use numpy.where to get the values based on Boolean masking, then assign it back to the new column
df[new_col]=np.where(df['ID'].isin(ids), met, else_met)
OUTPUT:
ID col.A result
0 AB01 Yes yes
1 AB02 NaN yes
2 AB02 Yes yes
3 AB03 Yes yes
4 AB03 Yes yes
5 AB03 NaN yes
6 AB04 NaN no
Since you are struggling with the method mentioned above which can possibly due to different version of the libraries, you can try following alternate solution (however, it is not efficient)
df[new_col]=df['ID'].apply(lambda x: met if x in ids else else_met)
Another numpy approach combined with List-Comprehension:
values = [met if x else else_met for x in np.isin(df['ID'].values, ids)]
# ['yes', 'yes', 'yes', 'yes', 'yes', 'yes', 'no']
df[new_col] = values
CodePudding user response:
you can check for every id in df if is in ids
import pandas as pd
df = pd.DataFrame(
{
'ID': ['AB01', 'AB02', 'AB02', 'AB03', 'AB03','AB03', 'AB04'],
'col.A': ["Yes",np.nan,'Yes','Yes',"Yes",np.nan, np.nan]
}
)
ids = ['AB01', 'AB02', 'AB03']
new_col = 'result'
df[new_col] = ['yes' if val in ids else 'no' for val in df['ID'] ]
#outputs
ID col.A result
0 AB01 Yes yes
1 AB02 NaN yes
2 AB02 Yes yes
3 AB03 Yes yes
4 AB03 Yes yes
5 AB03 NaN yes
6 AB04 NaN no
