I have a data frame that has information about buildings, their status and the time when the status is recorded.
For the sake of simplicity I encoded them into numbers in this example table:
| id | time | status |
|---|---|---|
| 101 | 1 | 1 |
| 101 | 2 | 2 |
| 101 | 3 | 3 |
| 102 | 1 | 2 |
| 102 | 2 | 1 |
| 102 | 3 | 2 |
| 102 | 4 | 3 |
I want to extract the IDs where their status is not strictly increasing. By that I mean that their status should always increase as time increases just like ID 101
As you can see ID 102 at time 1 has status 2, but at time 2 it has the status 1.
So if you feed this table to the function it should return a list of ID where it only contains 102.
Thank you in advance!
CodePudding user response:
Get differencies per id in one groupby in lambda function (for avoid double groupby) for check, if all values are greater like 0 and for list filter indices with inverted mask for values NOT matched condition:
#if necessary sorting per id and time
df = df.sort_values(['id','time'])
s = df.groupby('id')['status'].apply(lambda x: np.all(np.diff(x) > 0))
L = s.index[~s].tolist()
print (L)
[102]
If need rows use GroupBy.transform for possible filter in boolean indexing:
#if necessary sorting per id and time
df = df.sort_values(['id','time'])
df1 = df[~df.groupby('id')['status'].transform(lambda x: np.all(np.diff(x) > 0))]
print (df1)
id time status
3 102 1 2
4 102 2 1
5 102 3 2
6 102 4 3
CodePudding user response:
Here's one approach: set_index to "id", then groupby it, find differences in consecutive "status" entries and see if any value is less than 0; this creates a boolean Series where any negative value is True. Then groupby "id" again and see if any "id" has negative value; this also creates a boolean Series. Filter the index of the Series with its values.
msk = df.set_index('id').groupby(level=0)['status'].diff().le(0).groupby(level=0).any()
out = msk[msk].index.tolist()
Output:
[102]
