Home > Back-end >  add multiple columns programmatically from individual criteria/rules
add multiple columns programmatically from individual criteria/rules

Time:01-27

I would like to add multiple columns programmatically to a dataframe using pre-defined rules. As an example, I would like to add 3 columns to the below dataframe, based on whether or not they satisfy the three rules indicated in code below:

     #define dataframe
df1 = pd.DataFrame({"time1": [0, 1, 1, 0, 0],
                    "time2": [1, 0, 0, 0, 1],
                    "time3": [0, 0, 0, 1, 0],
                    "outcome": [1, 0, 0, 1, 0]})

     #define "rules" for adding subsequent columns
rule_1 = (df1["time1"] == 1)
rule_2 = (df1["time2"] == 1)
rule_3 = (df1["time3"] == 1)

     #add new columns based on whether or not above rules are satisfied
df1["rule_1"] = np.where(rule_1, 1, 0)
df1["rule_2"] = np.where(rule_2, 1, 0)
df1["rule_3"] = np.where(rule_3, 1, 0)

As you can see my approach gets tedious when I need to add 10s of columns - each based on a different "rule" - to a test dataframe.

Is there a way to do this more easily without defining each column manually along with its individual np.where clause? I tried doing something like this, but pandas does not accept this.

 rules = [rule_1, rule_2, rule_3]
 for rule in rules:
     df1[rule] = np.where(rule, 1, 0)

Any ideas on how to make my approach more programmatically efficient?

CodePudding user response:

The solution you provided doesn't work because you are using the rule element as the new dataframe column for the rule. I would solve it like this:

rules = [rule_1, rule_2, rule_3]
for i, rule in enumerate(rules):
    df1[f'rule_{i 1}'] = np.where(rule, 1, 0)

CodePudding user response:

Leverage pythons f strings in a for loop. They are good at this

#Create a list by filtering the time columns

cols=list(df1.filter(regex='time', axis=1).columns)

#Iterate through the list of columns imposing your conditions using np.where

for col in cols:
     df1[f'{col}_new'] = df1[f'{col}'].apply(lambda x:np.where(x==1,1,0))

CodePudding user response:

I might be oversimplifying your rules, but something like:

rules = [
    ('item1', 1),
    ('item2', 1),
    ('item3', 1),
]

for i, (col, val) in enumerate(rules):
    df[f"rule_{i   1}"] = np.where(df[col] == val, 1, 0)
  •  Tags:  
  • Related