I would like to add multiple columns programmatically to a dataframe using pre-defined rules. As an example, I would like to add 3 columns to the below dataframe, based on whether or not they satisfy the three rules indicated in code below:
#define dataframe
df1 = pd.DataFrame({"time1": [0, 1, 1, 0, 0],
"time2": [1, 0, 0, 0, 1],
"time3": [0, 0, 0, 1, 0],
"outcome": [1, 0, 0, 1, 0]})
#define "rules" for adding subsequent columns
rule_1 = (df1["time1"] == 1)
rule_2 = (df1["time2"] == 1)
rule_3 = (df1["time3"] == 1)
#add new columns based on whether or not above rules are satisfied
df1["rule_1"] = np.where(rule_1, 1, 0)
df1["rule_2"] = np.where(rule_2, 1, 0)
df1["rule_3"] = np.where(rule_3, 1, 0)
As you can see my approach gets tedious when I need to add 10s of columns - each based on a different "rule" - to a test dataframe.
Is there a way to do this more easily without defining each column manually along with its individual np.where clause? I tried doing something like this, but pandas does not accept this.
rules = [rule_1, rule_2, rule_3]
for rule in rules:
df1[rule] = np.where(rule, 1, 0)
Any ideas on how to make my approach more programmatically efficient?
CodePudding user response:
The solution you provided doesn't work because you are using the rule element as the new dataframe column for the rule. I would solve it like this:
rules = [rule_1, rule_2, rule_3]
for i, rule in enumerate(rules):
df1[f'rule_{i 1}'] = np.where(rule, 1, 0)
CodePudding user response:
Leverage pythons f strings in a for loop. They are good at this
#Create a list by filtering the time columns
cols=list(df1.filter(regex='time', axis=1).columns)
#Iterate through the list of columns imposing your conditions using np.where
for col in cols:
df1[f'{col}_new'] = df1[f'{col}'].apply(lambda x:np.where(x==1,1,0))
CodePudding user response:
I might be oversimplifying your rules, but something like:
rules = [
('item1', 1),
('item2', 1),
('item3', 1),
]
for i, (col, val) in enumerate(rules):
df[f"rule_{i 1}"] = np.where(df[col] == val, 1, 0)
