Let's say I have a set-up like this
import pandas as pd
def dummy(val1, val2):
return val1 * val2 / 10
df = pd.DataFrame({'a': range(1, 3), 'b': range(2, 4), 'c': range(3, 5)})
d = {'a': 3, 'b': 10}
a b c
0 1 2 3
1 2 3 4
Now I would like to apply dummy to the columns in df which exist as keys in d and add new columns; thereby val1 refers to the values in the respective columns and val2 to the value in d for the respective key.
I could do it like this
for k, v in d.items():
# d[k] is of course just v; it's just to show that k is required for both input values
df[f'{k}_calc'] = dummy(df[k], d[k])
which gives me the desired outcome
a b c a_calc b_calc
0 1 2 3 0.3 2.0
1 2 3 4 0.6 3.0
Is there a more straightforward implementation available that avoids the loop?
CodePudding user response:
You can try something like this:
cols = pd.Index(['a','b'])
#Using @richardec idea of the dictionary keys
cols = pd.Index(d.keys())
df[cols '_calc'] = df[cols].apply(lambda x: dummy(x, d[x.name]))
df
Output:
a b c a_calc b_calc
0 1 2 3 0.3 2.0
1 2 3 4 0.6 3.0
Details:
- Using pd.DataFrame.apply, you are passing each column of the dataframe into the lambda.
- x.name is the column header.
- Using x, and d[x.name] as inputs to your custom function.
CodePudding user response:
Not really. Calling a function for each row inevitable will lead to row-by-row execution one way or the other. But, you can do some things to make it more pandas-esque:
I was thinking that it could be done with agg, but I actually don't think so because agg can't use different pass different values to the aggregation functions based on the column. So using apply:
tmp = df[d.keys()].apply(lambda col: dummy(col, d[col.name])).add_suffix('_calc')
df = pd.concat([df, tmp], axis=1)
Output:
>>> df
a b c a_calc b_calc
0 1 2 3 0.3 2.0
1 2 3 4 0.6 3.0
Note the use of add_suffix to add suffixes to the column names.
CodePudding user response:
I would convert the dictionary to Series to benefit from broadcasting and index alignment:
s = pd.Series(d)
df.join((df.reindex(columns=s.index)*s/10).add_suffix('_calc'))
output:
a b c a_calc b_calc
0 1 2 3 0.3 2.0
1 2 3 4 0.6 3.0
CodePudding user response:
Convert d to a Series object, multiply element-wise with df, drop NaN valued columns (which appears because not all column names in df have a corresponding key in d) and concatenate the result with df on axis:
out = pd.concat((df, (df*pd.Series(d)/10).dropna(axis='columns').add_suffix('_calc')), axis=1)
Output:
a b c a_calc b_calc
0 1 2 3 0.3 2.0
1 2 3 4 0.6 3.0
