Home > Blockchain >  Count elements in defined groups in pandas dataframe
Count elements in defined groups in pandas dataframe

Time:01-19

Say I have a dataframe and I want to count how many times we have element e.g [1,5,2] in a/each column.

I could do something like

elem_list = [1,5,2]

for e in elemt_list:
 (df["col1"]==e).sum()

but isn't there a better way like

elem_list = [1,5,2]
df["col1"].count_elements(elem_list)

#1 5    # 1 occurs 5 times
#5 3    # 5 occurs 3 times
#2 0    # 2 occurs 0 times

Note it should count all the elements in the list, and return "0" if an element in the list is not in the column.

CodePudding user response:

Pass to the Categorical which will return 0 for missing item

pd.Categorical(df['col1'],elem_list).value_counts()
Out[62]: 
1    3
5    0
2    1
dtype: int64

CodePudding user response:

You could do something like that:

df = pd.DataFrame({"col1":np.random.randint(0,10, 100)})
df[df["col1"].isin([0,1])].value_counts()

# col1
# 1       17
# 0       10
# dtype: int64

CodePudding user response:

First filter by Series.isin and DataFrame.loc and then use Series.value_counts, last if order is important add Series.reindex:

df.loc[df["col1"].isin(elem_list), 'col1'].value_counts().reindex(elem_list, fill_values=0)

CodePudding user response:

You can use value_counts and reindex:

df = pd.DataFrame({'col1': [1,1,5,1,5,1,1,4,3]})

elem_list = [1,5,2]
df['col1'].value_counts().reindex(elem_list, fill_value=0)

output:

1    5
5    2
2    0

benchmark (100k values):

# setup
df = pd.DataFrame({'col1': np.random.randint(0,10, size=100000)})

df['col1'].value_counts().reindex(elem_list, fill_value=0)
# 774 µs ± 10.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

pd.Categorical(df['col1'],elem_list).value_counts()
# 2.72 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

df.loc[df["col1"].isin(elem_list), 'col1'].value_counts().reindex(elem_list, fill_value=0)
# 2.98 ms ± 152 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  •  Tags:  
  • Related