I need to add an indicator column to my dataframe that flags user with promo code (1 if on promo else 0 ). I need to look at two columns and see if any promo code exist under either of col_promo_1, col_promo_2. This is the code I'm using but it returns Nan value:
df['promo_ind'] = df[['col_promo_1', 'col_promo_2']].apply(lambda x: 1 if x is not None else 0)
However, when I use the code with only one column for example col_promo_1, the result is accurate. Any thoughts on how can I get this fixed?
CodePudding user response:
Make a new column:
df['promo_ind'] = 0
You can build a mask and use it to set the values in the correct places:
df.loc[df['col_promo_1'].notna() | df['col_promo_2'].notna(), 'promo_ind'] = 1
CodePudding user response:
Sticking to your approach, let's assume you have the below example DataFrame (df) with two columns (promo1 and promo2) and the goal is to indicate promo status in a third column, if a user is on either promo1 or promo2.
import pandas as pd
df = pd.DataFrame(data={'promo1': [0, 1, 0, 1], 'promo2': [0, 0, 1, 1]})
The line below, creates a third column, checks the two existing columns at every row and calculates the corresponding promo status accordingly. (The issue with the posted code is that "x" takes columns in the DataFrame one by one, although you want to take rows and check them. The fix is to set attribute axis=1 for apply() method.)
df['promo_ind'] = df[['promo1', 'promo2']].apply(lambda row: 0 if (row['promo1']==0 and row['promo2']==0) else 1, axis=1)
