I need, for each (x) row of a dataframe, to get the value stored in the previous row (x-1) and in a specific target column. The header of the target column is stored in a column (Target_col) of the x row.
0 1 2 Target_col
Date
2022-01-01 37.0 26.0 NaN 0
2022-01-02 NaN 41.0 0.0 1
2022-01-03 NaN 40.0 43.0 1
2022-01-04 NaN NaN 23.0 2
For example, in the last row my Target_value is 43.0, which is stored in the column "2" of the previous row. This is the expected output:
0 1 2 Target_col Target_value
Date
2022-01-01 37.0 26.0 NaN 0 NaN
2022-01-02 NaN 41.0 0.0 1 26.0
2022-01-03 NaN 40.0 43.0 1 41.0
2022-01-04 NaN NaN 23.0 2 43.0
I was able to get what I want by duplicating the df:
df2 = df.shift(periods=1)
df['Target_value'] = df2.lookup(df.index, df['Target_col'])
but I guess there is a smarter way to do that. Furthermore, lookup is deprecated. Any ideas?
Please note that I reshaped my question and the example df to make everything clearer, so itprorh66's answer and my comments to his answer are are no longer relevant.
CodePudding user response:
I would approach the problem a little differently as illustrated below: given a base dataframe of the form: df:
date a b c
0 2022-01-01 12.0 11.0 NaN
1 2022-01-02 10.0 11.0 NaN
2 2022-01-03 NaN 10.0 10.0
3 2022-01-04 NaN 11.0 9.0
4 2022-01-05 NaN NaN 12.0
In stead of defining the column that contains the first valid data, I would create a column which just contains the first valid piece of data as follows:
# helper function to find first valid data
def findfirst(row, cols_list):
# return the first non-Nan value found within row
for c in cols_list:
if not np.isnan(row[c]):
return row[c]
return np.nan
Then using the above helper, I add the column 'First' which contains the desired data as follows:
df['First'] = df.apply(lambda row: findfirst(row, ['a', 'b', 'c']), axis= 1)
This create the following dataframe:
date a b c First
0 2022-01-01 12.0 11.0 NaN 12.0
1 2022-01-02 10.0 11.0 NaN 10.0
2 2022-01-03 NaN 10.0 10.0 10.0
3 2022-01-04 NaN 11.0 9.0 11.0
4 2022-01-05 NaN NaN 12.0 12.0
From the above you can then compute the change value as follows:
df['Change'] = (df['First']/df['First'].shift())-1
Which yields:
** date a b c First Change
0 2022-01-01 12.0 11.0 NaN 12.0 NaN
1 2022-01-02 10.0 11.0 NaN 10.0 -0.166667
2 2022-01-03 NaN 10.0 10.0 10.0 0.000000
3 2022-01-04 NaN 11.0 9.0 11.0 0.100000
4 2022-01-05 NaN NaN 12.0 12.0 0.090909**
CodePudding user response:
It's a bit convoluted but this works:
cols = df.columns[:-1]
temp = df['Target_col'].shift(-1).values[:-1]
temp = np.append(temp, 0)
target_values = df[cols].to_numpy()[np.arange(len(df)), temp.astype(int)][:-1]
target_values = np.insert(target_values, 0, 0, axis=0)
df['target_values'] = target_values.tolist()
