add multiple columns programmatically from individual criteria/rules-CodePudding

I would like to add multiple columns programmatically to a dataframe using pre-defined rules. As an example, I would like to add 3 columns to the below dataframe, based on whether or not they satisfy the three rules indicated in code below:

     #define dataframe
df1 = pd.DataFrame({"time1": [0, 1, 1, 0, 0],
                    "time2": [1, 0, 0, 0, 1],
                    "time3": [0, 0, 0, 1, 0],
                    "outcome": [1, 0, 0, 1, 0]})

     #define "rules" for adding subsequent columns
rule_1 = (df1["time1"] == 1)
rule_2 = (df1["time2"] == 1)
rule_3 = (df1["time3"] == 1)

     #add new columns based on whether or not above rules are satisfied
df1["rule_1"] = np.where(rule_1, 1, 0)
df1["rule_2"] = np.where(rule_2, 1, 0)
df1["rule_3"] = np.where(rule_3, 1, 0)

As you can see my approach gets tedious when I need to add 10s of columns - each based on a different "rule" - to a test dataframe.

Is there a way to do this more easily without defining each column manually along with its individual np.where clause? I tried doing something like this, but pandas does not accept this.

 rules = [rule_1, rule_2, rule_3]
 for rule in rules:
     df1[rule] = np.where(rule, 1, 0)

Any ideas on how to make my approach more programmatically efficient?

CodePudding user response：

The solution you provided doesn't work because you are using the rule element as the new dataframe column for the rule. I would solve it like this:

rules = [rule_1, rule_2, rule_3]
for i, rule in enumerate(rules):
    df1[f'rule_{i 1}'] = np.where(rule, 1, 0)

CodePudding user response：

Leverage pythons f strings in a for loop. They are good at this

#Create a list by filtering the time columns

cols=list(df1.filter(regex='time', axis=1).columns)

#Iterate through the list of columns imposing your conditions using np.where

for col in cols:
     df1[f'{col}_new'] = df1[f'{col}'].apply(lambda x:np.where(x==1,1,0))

CodePudding user response：

I might be oversimplifying your rules, but something like:

rules = [
    ('item1', 1),
    ('item2', 1),
    ('item3', 1),
]

for i, (col, val) in enumerate(rules):
    df[f"rule_{i   1}"] = np.where(df[col] == val, 1, 0)