numpy.unique has an optional argument return_counts. From the docs:
return_counts bool, optional If True, also return the number of times each unique item appears in ar.
New in version 1.9.0.
Which is straightforward for a 1-D array. However, I'm trying to the unique values and counts for each row of a matrix. Here is a sample matrix:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
])
When I apply np.unique:
np.unique(m_sample, axis=1, return_counts=True)
(array([[1, 1, 2],
[2, 2, 2],
[3, 3, 3],
[1, 5, 4]]), array([1, 1, 1]))
I'm not really sure what the returned matrix here represents, much less so the counts array. Is this perhaps a bug in numpy (or maybe a case the developer did not consider)? Am I misunderstanding how to use the parameters in this case?
CodePudding user response:
When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5],
[1, 2, 1]
])
In such case np.unique(m_sample, axis=0, return_counts=True) gives:
(array([[1, 2, 1],
[1, 4, 5],
[2, 2, 2],
[3, 3, 3]]),
array([2, 1, 1, 1]))
The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.
To get unique values in each row you can try, for example, the following:
import numpy as np
m_sample = np.array([
[1, 2, 1],
[2, 2, 2],
[3, 3, 3],
[1, 4, 5]
])
s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]
It gives:
[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]
