Differences in an array based on groups defined by another array-CodePudding

I have two arrays of the same size. One, call it A, contains a series of repeated numbers; the other, B contains random numbers.

import numpy as np

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

I need to find the differences in B between the two extremes defined by the groups in A. More specifically, I need an output C such as

C = [2, -2, 2, 1]

where each term is the difference 3 - 1, 4 - 6, 9 - 7, and 11 - 10, i.e., the difference between the extremes in B identified by the groups of repeated numbers in A.

I tried to play around with itertools.groupby to isolate the groups in the first array, but it is not clear to me how to exploit the indexing to operate the differences in the second.

CodePudding user response：

Edit: C is now sorted the same way as in the question

C = []
_, idx = np.unique(A, return_index=True)
for i in A[np.sort(idx)]:
    bs = B[A==i]
    C.append(bs[-1] - bs[0])

print(C) // [2, -2, 2, 1]

np.unique returns, for each unique value in A, the index of the first appearance of it.

i in A[np.sort(idx)] iterates over the unique values in the order of the indexes.

B[A==i] extracts the values from B at the same indexes as those values in A.

CodePudding user response：

This is easily achieved using pandas' groupby:

A = np.array([1,1,1,2,2,2,0,0,0,3,3])
B = np.array([1,2,3,6,5,4,7,8,9,10,11])

import pandas as pd
pd.Series(B).groupby(A, sort=False).agg(lambda g: g.iloc[-1]-g.iloc[0]).to_numpy()

output: array([ 2, -2, 2, 1])

using itertools.groupby:

from itertools import groupby

[(x:=list(g))[-1][1]-x[0][1] for k, g in groupby(zip(A,B), lambda x: x[0])]

output: [2, -2, 2, 1]

NB. Note that the two solutions will behave differently if there are different non-consecutive groups

CodePudding user response：

You could use np.unique to get starting indices of each group element and the size of each group.

_, idx, grp_len = np.unique(A, return_index=True, return_counts=True)
s_idx           = np.argsort(idx)
idx             = idx[s_idx]
out             = B[idx   grp_len[s_idx]-1] - B[idx]

print(out)
# array([ 2, -2,  2,  1])

NB: Assuming each grouping value is unique. If A can have values such as [0, 0, 1, 1, 0, 0] the above solution would fail.