Home > Software design >  Get value of previous row / specific column from a pandas df
Get value of previous row / specific column from a pandas df

Time:02-02

I need, for each (x) row of a dataframe, to get the value stored in the previous row (x-1) and in a specific target column. The header of the target column is stored in a column (Target_col) of the x row.

               0     1     2  Target_col
Date                                    
2022-01-01  37.0  26.0   NaN           0
2022-01-02   NaN  41.0   0.0           1
2022-01-03   NaN  40.0  43.0           1
2022-01-04   NaN   NaN  23.0           2

For example, in the last row my Target_value is 43.0, which is stored in the column "2" of the previous row. This is the expected output:

               0     1     2  Target_col  Target_value
Date                                                  
2022-01-01  37.0  26.0   NaN           0           NaN
2022-01-02   NaN  41.0   0.0           1          26.0
2022-01-03   NaN  40.0  43.0           1          41.0
2022-01-04   NaN   NaN  23.0           2          43.0

I was able to get what I want by duplicating the df:

df2 = df.shift(periods=1)
df['Target_value'] = df2.lookup(df.index, df['Target_col'])

but I guess there is a smarter way to do that. Furthermore, lookup is deprecated. Any ideas?

Please note that I reshaped my question and the example df to make everything clearer, so itprorh66's answer and my comments to his answer are are no longer relevant.

CodePudding user response:

I would approach the problem a little differently as illustrated below: given a base dataframe of the form: df:

    date         a       b      c
0   2022-01-01  12.0    11.0    NaN
1   2022-01-02  10.0    11.0    NaN
2   2022-01-03  NaN     10.0    10.0
3   2022-01-04  NaN     11.0    9.0
4   2022-01-05  NaN      NaN    12.0 

In stead of defining the column that contains the first valid data, I would create a column which just contains the first valid piece of data as follows:

# helper function to find first valid data
def findfirst(row, cols_list):
    # return the first non-Nan value found within row
    for c in cols_list:
        if not np.isnan(row[c]):
            return row[c]
    return np.nan

Then using the above helper, I add the column 'First' which contains the desired data as follows:

df['First'] = df.apply(lambda row: findfirst(row, ['a', 'b', 'c']), axis= 1)  

This create the following dataframe:

    date          a       b      c      First
0   2022-01-01  12.0    11.0    NaN     12.0
1   2022-01-02  10.0    11.0    NaN     10.0
2   2022-01-03  NaN     10.0    10.0    10.0
3   2022-01-04  NaN     11.0    9.0     11.0
4   2022-01-05  NaN      NaN    12.0    12.0   

From the above you can then compute the change value as follows:

df['Change'] = (df['First']/df['First'].shift())-1

Which yields:

**  date    a   b   c   First   Change
0   2022-01-01  12.0    11.0    NaN 12.0    NaN
1   2022-01-02  10.0    11.0    NaN 10.0    -0.166667
2   2022-01-03  NaN 10.0    10.0    10.0    0.000000
3   2022-01-04  NaN 11.0    9.0 11.0    0.100000
4   2022-01-05  NaN NaN 12.0    12.0    0.090909**

CodePudding user response:

It's a bit convoluted but this works:

cols = df.columns[:-1]
temp = df['Target_col'].shift(-1).values[:-1]
temp = np.append(temp, 0)
target_values = df[cols].to_numpy()[np.arange(len(df)), temp.astype(int)][:-1]
target_values = np.insert(target_values, 0, 0, axis=0)
df['target_values'] = target_values.tolist()
  •  Tags:  
  • Related