I have a simple pandas DataFrame where I need to add a new column that shows the 'count' of occurrences for the 'current_price' in a range of other columns 'pricemonths', that match the current_price column:
import pandas as pd
import numpy as np
# my data
data = {'Item':['Bananas', 'Apples', 'Pears', 'Avocados','Grapes','Melons'],
'Jan':[1,0.5,1.1,0.6,2,4],
'Feb':[0.9,0.5,1,0.6,2,5],
'Mar':[1,0.6,1,0.6,2.1,6],
'Apr':[1,0.6,1,0.6,2,5],
'May':[1,0.5,1.1,0.6,2,5],
'Current_Price':[1,0.6,1,0.6,2,4]
}
# import my data
df = pd.DataFrame(data)
pricemonths=['Jan','Feb','Mar','Apr','May']
Thus, my final dataframe would contain another column ('times_found') with the values:
'times_found'
4
2
3
5
4
1
CodePudding user response:
One way of doing it is to transpose the prices part of df, then use eq to compare with "Current_Price" across indices (which creates a boolean DataFrame with True for matching prices and False otherwise) and find sum across rows:
df['times_found'] = df['Current_Price'].eq(df.loc[:,'Jan':'May'].T).sum(axis=0)
or use numpy broadcasting:
df['times_found'] = (df.loc[:,'Jan':'May'].to_numpy() == df[['Current_Price']].to_numpy()).sum(axis=1)
Excellent suggestion from @HenryEcker: DataFrame equals on an axis may be faster than transposing for larger DataFrames:
df['times_found'] = df.loc[:, 'Jan':'May'].eq(df['Current_Price'], axis=0).sum(axis=1)
Output:
Item Jan Feb Mar Apr May Current_Price times_found
0 Bananas 1.0 0.9 1.0 1.0 1.0 1.0 4
1 Apples 0.5 0.5 0.6 0.6 0.5 0.6 2
2 Pears 1.1 1.0 1.0 1.0 1.1 1.0 3
3 Avocados 0.6 0.6 0.6 0.6 0.6 0.6 5
4 Grapes 2.0 2.0 2.1 2.0 2.0 2.0 4
5 Melons 4.0 5.0 6.0 5.0 5.0 4.0 1
