Home > Software design >  Using value_counts() and filter elements based on number of instances
Using value_counts() and filter elements based on number of instances

Time:01-05

I use the following code to create two arrays in a histogram, one for the counts (percentages) and the other for values.

df = row.value_counts(normalize=True).mul(100).round(1)
counts = df                   # contains percentages
values = df.keys().tolist()

So, an output looks like

counts = 66.7, 8.3, 8.3, 8.3, 8.3
values = 1024, 356352, 73728, 16384, 4096

Problem is that some values exist one time only and I would like to ignore them. In the example above, only 1024 repeated multiple times and others are there only once. I can manually check the number of occurrences in the row and see if they are not repeated multiple times and ignore them.

df = row.value_counts(normalize=True).mul(100).round(1)
counts = df                   # contains percentages
values = df.keys().tolist()
for v in values:
    # N = get_number_of_instances in row
    # if N == 1
    #    remove v in row

I would like to know if there are other ways for that using the built-in functions in Pandas.

CodePudding user response:

Some clarity requested on your question in comments above

If keys is a column and you want to retain non duplicates, please try

values=df.loc[~df['keys'].duplicated(keep=False), 'keys'].to_list()
  •  Tags:  
  • Related