Create new column showing the occurrences of a column value in a range of others-CodePudding

I have a simple pandas DataFrame where I need to add a new column that shows the 'count' of occurrences for the 'current_price' in a range of other columns 'pricemonths', that match the current_price column:

import pandas as pd
import numpy as np

# my data
data = {'Item':['Bananas', 'Apples', 'Pears', 'Avocados','Grapes','Melons'],
        'Jan':[1,0.5,1.1,0.6,2,4],
        'Feb':[0.9,0.5,1,0.6,2,5],
        'Mar':[1,0.6,1,0.6,2.1,6],
        'Apr':[1,0.6,1,0.6,2,5],
        'May':[1,0.5,1.1,0.6,2,5],
        'Current_Price':[1,0.6,1,0.6,2,4]        
       }

# import my data
df = pd.DataFrame(data)

pricemonths=['Jan','Feb','Mar','Apr','May']

Thus, my final dataframe would contain another column ('times_found') with the values:

'times_found'
4
2
3
5
4
1

CodePudding user response：

One way of doing it is to transpose the prices part of df, then use eq to compare with "Current_Price" across indices (which creates a boolean DataFrame with True for matching prices and False otherwise) and find sum across rows:

df['times_found'] = df['Current_Price'].eq(df.loc[:,'Jan':'May'].T).sum(axis=0)

or use numpy broadcasting:

df['times_found'] = (df.loc[:,'Jan':'May'].to_numpy() == df[['Current_Price']].to_numpy()).sum(axis=1)

Excellent suggestion from @HenryEcker: DataFrame equals on an axis may be faster than transposing for larger DataFrames:

df['times_found'] = df.loc[:, 'Jan':'May'].eq(df['Current_Price'], axis=0).sum(axis=1)

Output:

       Item  Jan  Feb  Mar  Apr  May  Current_Price  times_found
0   Bananas  1.0  0.9  1.0  1.0  1.0            1.0            4
1    Apples  0.5  0.5  0.6  0.6  0.5            0.6            2
2     Pears  1.1  1.0  1.0  1.0  1.1            1.0            3
3  Avocados  0.6  0.6  0.6  0.6  0.6            0.6            5
4    Grapes  2.0  2.0  2.1  2.0  2.0            2.0            4
5    Melons  4.0  5.0  6.0  5.0  5.0            4.0            1