I am trying to find a way to apply a custom function between the same columns in two separate pandas dataframes, to get a new dataframe.
So, I have 2 dataframes likes so:
import pandas as pd
df1 = pd.DataFrame({'num':['NUM1','NUM2','NUM3','NUM4','NUM5'],'mean':[3,4,6,8,2],'count':[2,2,1,4,5]})
df2 = pd.DataFrame({'num':['NUM1','NUM4','NUM5','NUM6','NUM7'],'mean':[4,2,2,5,1],'count':[2,3,1,1,1]})
df1
num mean count
0 NUM1 3 2
1 NUM2 4 2
2 NUM3 6 1
3 NUM4 8 4
4 NUM5 2 5
df2
num mean count
0 NUM1 4 2
1 NUM4 2 3
2 NUM5 2 1
3 NUM6 5 1
4 NUM7 1 1
I also got a custom function that updates the mean of two separate samples:
def mean_updater(mean1,n1,mean2,n2):
return round(((mean1*n1) (mean2*n2))/(n1 n2),2)
And a function to update the count:
def count_updater(n1,n2):
return n1 n2
What I am trying to achieve is a new df that contains updated mean and count for matching records (present in both dfs), with records not present in both dfs appended with no changes.
So, the result should be:
new_df
num mean count
0 NUM1 3.5 4
1 NUM2 4 2
2 NUM3 6 1
3 NUM4 5.43 7
4 NUM5 2 6
5 NUM6 5 1
6 NUM7 1 1
CodePudding user response:
Not the fanciest solution, but might work for your use case:
# merge the data
common = pd.merge(df1, df2, on="num", how="outer")
common["mean"] = common[["mean_x", "mean_y"]].mean(axis=1)
common["count"] = common[["count_x", "count_y"]].sum(axis=1)
common[["num", "mean", "count"]]
CodePudding user response:
You can pass columns to your functions, then join them in concat and last replace missing values by original df11 and df22:
#changed to np.round for working with arrays
def mean_updater(mean1,n1,mean2,n2):
return np.round(((mean1*n1) (mean2*n2))/(n1 n2),2)
def count_updater(n1,n2):
return n1 n2
#for match by num columns
df11 = df1.set_index('num')
df22 = df2.set_index('num')
s1 = mean_updater(df11['mean'], df11['count'], df22['mean'], df22['count'])
s2 = count_updater(df11['count'], df22['count'])
df = (pd.concat([s1.rename('mean'), s2], axis=1)
.fillna(df11)
.fillna(df22)
.reset_index())
print (df)
num mean count
0 NUM1 3.50 4.0
1 NUM2 4.00 2.0
2 NUM3 6.00 1.0
3 NUM4 5.43 7.0
4 NUM5 2.00 6.0
5 NUM6 5.00 1.0
6 NUM7 1.00 1.0
