I have the next dataframe with a lot of elements
| code | status | Month |
|---|---|---|
| a | Active | 1 |
| b | Inactive | 2 |
| c | Active | 3 |
| b | Active | 4 |
| a | Inactive | 7 |
| c | Active | 8 |
Is there anyway to create a dataframe with the next result
| code | status |
|---|---|
| a | change to inactive on June |
| b | change to active on april |
| c | no changes |
Or to have something like a report, It will be so helpful
CodePudding user response:
Try:
sort_valuesby the "code" and "Month" columns- Use
np.whereto assign actions when the status changes. drop_duplicatesto keep only final row for each "code"
df = df.sort_values(["code", "Month"])
df["status"] = (np.where((df["code"].eq(df["code"].shift())) &
(df["status"].ne(df["status"].shift())),
"change to " df["status"].str.lower() " in " pd.to_datetime(df["Month"],format="%m").dt.strftime("%B"),
"no changes")
)
output = df.drop_duplicates("code", keep="last").drop("Month", axis=1).reset_index(drop=True)
>>> output
code status
0 a change to inactive in July
1 b change to active in April
2 c no changes
CodePudding user response:
You can use a groupby and join on 'status' and 'month' and return a new dataframe. Then you can split and combine parts of those two columns (status and month), and lastly you can change the codes that had 'Active' in both occasions with loc and the help of your grouped object:
# Convert Month and sort
df['Month'] = pd.to_datetime(df['Month'], format='%m').dt.month_name()
df = df.sort_values(["code", "Month"])
>>> print(df)
code status Month
0 a Active January
1 b Inactive February
2 c Active March
3 b Active April
4 a Inactive July
5 c Active August
# Groupby and join
g = df.groupby(['code']).agg(lambda x: ' '.join(x))
>>> print(g)
status Month
code
a Active Inactive January July
b Inactive Active February April
c Active Active March August
# Combine columns and replace to 'no change'
res = pd.DataFrame('change to ' g.status.str.split(' ',1).str[1] ' on ' g.Month.str.split(' ',1).str[1],columns=['status'])
res.loc[res.index.isin(g[g.status.eq('Active Active')].index),'status'] = 'no change'
>>> print(res)
status
code
a change to Inactive on July
b change to Active on April
c no change
Sample DF:
{'code': {0: 'a', 1: 'b', 2: 'c', 3: 'b', 4: 'a', 5: 'c'},
'status': {0: 'Active',
1: 'Inactive',
2: 'Active',
3: 'Active',
4: 'Inactive',
5: 'Active'},
'Month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 7, 5: 8}}
