Home > Blockchain >  Extract the rows with decreasing values of column of an ordered dataframe
Extract the rows with decreasing values of column of an ordered dataframe

Time:01-29

I have a data frame that has information about buildings, their status and the time when the status is recorded.

For the sake of simplicity I encoded them into numbers in this example table:

id time status
101 1 1
101 2 2
101 3 3
102 1 2
102 2 1
102 3 2
102 4 3

I want to extract the IDs where their status is not strictly increasing. By that I mean that their status should always increase as time increases just like ID 101

As you can see ID 102 at time 1 has status 2, but at time 2 it has the status 1.

So if you feed this table to the function it should return a list of ID where it only contains 102.

Thank you in advance!

CodePudding user response:

Get differencies per id in one groupby in lambda function (for avoid double groupby) for check, if all values are greater like 0 and for list filter indices with inverted mask for values NOT matched condition:

#if necessary sorting per id and time
df = df.sort_values(['id','time'])

s = df.groupby('id')['status'].apply(lambda x: np.all(np.diff(x) > 0))

L = s.index[~s].tolist()
print (L)
[102]

If need rows use GroupBy.transform for possible filter in boolean indexing:

#if necessary sorting per id and time
df = df.sort_values(['id','time'])

df1 = df[~df.groupby('id')['status'].transform(lambda x: np.all(np.diff(x) > 0))]
print (df1)
    id  time  status
3  102     1       2
4  102     2       1
5  102     3       2
6  102     4       3

CodePudding user response:

Here's one approach: set_index to "id", then groupby it, find differences in consecutive "status" entries and see if any value is less than 0; this creates a boolean Series where any negative value is True. Then groupby "id" again and see if any "id" has negative value; this also creates a boolean Series. Filter the index of the Series with its values.

msk = df.set_index('id').groupby(level=0)['status'].diff().le(0).groupby(level=0).any()
out = msk[msk].index.tolist()

Output:

[102]
  •  Tags:  
  • Related