Extract the rows with decreasing values of column of an ordered dataframe-CodePudding

I have a data frame that has information about buildings, their status and the time when the status is recorded.

For the sake of simplicity I encoded them into numbers in this example table:

id	time	status
101	1	1
101	2	2
101	3	3
102	1	2
102	2	1
102	3	2
102	4	3

I want to extract the IDs where their status is not strictly increasing. By that I mean that their status should always increase as time increases just like ID 101

As you can see ID 102 at time 1 has status 2, but at time 2 it has the status 1.

So if you feed this table to the function it should return a list of ID where it only contains 102.

Thank you in advance!

CodePudding user response：

Get differencies per id in one groupby in lambda function (for avoid double groupby) for check, if all values are greater like 0 and for list filter indices with inverted mask for values NOT matched condition:

#if necessary sorting per id and time
df = df.sort_values(['id','time'])

s = df.groupby('id')['status'].apply(lambda x: np.all(np.diff(x) > 0))

L = s.index[~s].tolist()
print (L)
[102]

If need rows use GroupBy.transform for possible filter in boolean indexing:

#if necessary sorting per id and time
df = df.sort_values(['id','time'])

df1 = df[~df.groupby('id')['status'].transform(lambda x: np.all(np.diff(x) > 0))]
print (df1)
    id  time  status
3  102     1       2
4  102     2       1
5  102     3       2
6  102     4       3

CodePudding user response：

Here's one approach: set_index to "id", then groupby it, find differences in consecutive "status" entries and see if any value is less than 0; this creates a boolean Series where any negative value is True. Then groupby "id" again and see if any "id" has negative value; this also creates a boolean Series. Filter the index of the Series with its values.

msk = df.set_index('id').groupby(level=0)['status'].diff().le(0).groupby(level=0).any()
out = msk[msk].index.tolist()

Output:

[102]