Home > Software engineering >  Interpretation of counts for `numpy.unique` when applied on a matrix
Interpretation of counts for `numpy.unique` when applied on a matrix

Time:01-27

numpy.unique has an optional argument return_counts. From the docs:

return_counts bool, optional If True, also return the number of times each unique item appears in ar.

New in version 1.9.0.

Which is straightforward for a 1-D array. However, I'm trying to the unique values and counts for each row of a matrix. Here is a sample matrix:

m_sample = np.array([
    [1, 2, 1],
    [2, 2, 2],
    [3, 3, 3],
    [1, 4, 5],
])

When I apply np.unique:

np.unique(m_sample, axis=1, return_counts=True)

(array([[1, 1, 2],
        [2, 2, 2],
        [3, 3, 3],
        [1, 5, 4]]),  array([1, 1, 1]))

I'm not really sure what the returned matrix here represents, much less so the counts array. Is this perhaps a bug in numpy (or maybe a case the developer did not consider)? Am I misunderstanding how to use the parameters in this case?

CodePudding user response:

When you specify an axis, np.unique returns unique subarrays indexed along this axis. To see is better, assume that one of the rows repeats:

m_sample = np.array([
    [1, 2, 1],
    [2, 2, 2],
    [3, 3, 3],
    [1, 4, 5],
    [1, 2, 1]
])

In such case np.unique(m_sample, axis=0, return_counts=True) gives:

(array([[1, 2, 1],
        [1, 4, 5],
        [2, 2, 2],
        [3, 3, 3]]),
 array([2, 1, 1, 1]))

The first element of this tuple lists unique rows of the array, and the second how many times each row appears in the array. In this example, the row [1, 2, 1] is repeated twice.

To get unique values in each row you can try, for example, the following:

import numpy as np

m_sample = np.array([
    [1, 2, 1],
    [2, 2, 2],
    [3, 3, 3],
    [1, 4, 5]
])

s = np.sort(m_sample, axis=1)
mask = np.full(m_sample.shape, True)
mask[:, 1:] = s[:, :-1] != s[:, 1:]
np.split(s[mask], np.cumsum(mask.sum(axis=1)))[:-1]

It gives:

[array([1, 2]), array([2]), array([3]), array([1, 4, 5])]
  •  Tags:  
  • Related