I use the following code to create two arrays in a histogram, one for the counts (percentages) and the other for values.
df = row.value_counts(normalize=True).mul(100).round(1)
counts = df # contains percentages
values = df.keys().tolist()
So, an output looks like
counts = 66.7, 8.3, 8.3, 8.3, 8.3
values = 1024, 356352, 73728, 16384, 4096
Problem is that some values exist one time only and I would like to ignore them. In the example above, only 1024 repeated multiple times and others are there only once. I can manually check the number of occurrences in the row and see if they are not repeated multiple times and ignore them.
df = row.value_counts(normalize=True).mul(100).round(1)
counts = df # contains percentages
values = df.keys().tolist()
for v in values:
# N = get_number_of_instances in row
# if N == 1
# remove v in row
I would like to know if there are other ways for that using the built-in functions in Pandas.
CodePudding user response:
Some clarity requested on your question in comments above
If keys is a column and you want to retain non duplicates, please try
values=df.loc[~df['keys'].duplicated(keep=False), 'keys'].to_list()
