Given a dummy dataset df as follow:
year v1 v2
0 2017 0.3 0.1
1 2018 0.1 0.1
2 2019 -0.2 0.5
3 2020 NaN -0.3
4 2021 0.8 0.0
or:
[{'year': 2017, 'v1': 0.3, 'v2': 0.1},
{'year': 2018, 'v1': 0.1, 'v2': 0.1},
{'year': 2019, 'v1': -0.2, 'v2': 0.5},
{'year': 2020, 'v1': nan, 'v2': -0.3},
{'year': 2021, 'v1': 0.8, 'v2': 0.0}]
I need to create two more columns trend_v1 and trend_v2 based on v1 and v2 respectively.
The logic to create new columns is this: for each column, if its current value is greater than the previous, the trend value is increase, if its current value is less than the previous, the trend value is decrease, if its current value is equal to the previous, the trend value is equal, if the current or previous value is NaN, the trend also is NaN.
year v1 v2 trend_v1 trend_v2
0 2017 0.3 0.1 NaN NaN
1 2018 0.1 0.1 decrease equal
2 2019 -0.2 0.5 decrease increase
3 2020 NaN -0.3 NaN decrease
4 2021 0.8 0.0 NaN increase
How could I achieve that in Pandas? Thanks for your help at advance.
CodePudding user response:
You can specify columns for test trend by compare shifted values with filtered missing values:
cols = ['v1','v2']
arr = np.where(df[cols] < df[cols].shift(),'decrease',
np.where(df[cols] > df[cols].shift(),'increase',
np.where(df[cols].isna() | df[cols].shift().isna(), None, 'equal')))
df = df.join(pd.DataFrame(arr, columns=cols, index=df.index).add_prefix('trend_'))
print (df)
year v1 v2 trend_v1 trend_v2
0 2017 0.3 0.1 None None
1 2018 0.1 0.1 decrease equal
2 2019 -0.2 0.5 decrease increase
3 2020 NaN -0.3 None decrease
4 2021 0.8 0.0 None increase
Or:
cols = ['v1','v2']
m1 = df[cols] < df[cols].shift()
m2 = df[cols] > df[cols].shift()
m3 = df[cols].isna() | df[cols].shift().isna()
arr = np.select([m1, m2, m3],['decrease','increase', None], default='equal')
df = df.join(pd.DataFrame(arr, columns=cols, index=df.index).add_prefix('trend_'))
