How to do COUNTIFS of values in a Pandas Dataframe and add result to a column?-CodePudding

I got two dataframes looking like this:

df1:

	omg	odds	outcome
0	2625	27	1
1	2625	22	1
2	2626	10	0
3	2628	15	1
4	2628	20	1

df2:

	omgangar
0	2625
1	2626
2	2627
3	2628
4	2629

In df2 I'd like to add a new column that counts the occurrences of df2[omgangar] in df1[omg] together with some other conditions (odds between two values and outcome equal to 1).

I've tried this code but it throws an error:

df2['outcome'] = 
sum((df1['omg'] == df2['omgangar']) & (df1['odds'].between(20,30)) & (df1['outcome'] == 1))

Error:

ValueError: Can only compare identically-labeled Series objects

The desired output of df2 is:

	omgangar	outcome
0	2625	2
1	2626	0
2	2627	0
3	2628	1
4	2629	0

CodePudding user response：

There is no need to use countifs, instead apply the filters first and then merge the dataframes.

First filter df1 with the two conditions, then groupby on omg and aggregate using count as follows:

s = df1.loc[df['odds'].between(20,30) & (df['outcome'] == 1)].groupby('omg')['odds'].count()

This will return a series s which looks like this:

omg
2625    2
2628    1
Name: odds, dtype: int64

Now, merge s with df2 and fill any resulting NaN values with 0:

df2.merge(df, left_on='omgangar', right_index=True, how='left').fillna(0.0)

Result:

    omgangar  odds
0       2625   2.0
1       2626   0.0
2       2627   0.0
3       2628   1.0
4       2629   0.0

CodePudding user response：

My suggestion would be to use merge and groupby for this task:

import pandas as pd

df1 = pd.DataFrame({
    'omg': [2625, 2625, 2626, 2628, 2628], 
    'odds': [27, 22, 10, 15, 20], 
    'outcome': [1, 1, 0, 1, 1]})
df2 = pd.DataFrame({'omgangar': list(range(2625, 2630))})

df = df2.merge(df1, left_on = 'omgangar', right_on = 'omg', how='left')
df['outcome'] = (df.outcome == 1) & (df.odds.between(20, 30))
df.groupby('omgangar')[['outcome']].agg('sum')

    outcome
omgangar    
2625    2
2626    0
2627    0
2628    1
2629    0

CodePudding user response：

Use DataFrame.loc for filter by mask with column omg and get counts by Series.value_counts, for new column use Series.map with replace missing not matched values to 0:

s = df1.loc[df1['odds'].between(20,30) & (df1['outcome'] == 1), 'omg'].value_counts()
df2['outcome'] = df2['omgangar'].map(s).fillna(0).astype(int)
print (df2)
   omgangar  outcome
0      2625        2
1      2626        0
2      2627        0
3      2628        1
4      2629        0