Home > Enterprise >  Change rows in pandas dataframe according to formula
Change rows in pandas dataframe according to formula

Time:01-06

I have the following dataframe:

import pandas as pd

data=[[ 6, 10,  2, 67, 51, 76, 80, 41, 95],
       [ 2, 20, 15, 21, 23, 53, 13, 95, 31],
       [ 3, 30, 25, 37, 25, 55, 30, 21, 63],
       [ 4, 40, 35, 93, 79, 95, 24, 61, 97],
       [ 1, 50, 45, 81, 96, 86, 70, 33, 35]]
   
df=pd.DataFrame(data,columns=['category', 'ceil', 'floor','c1','c2','c3','c4','c5','c6'])

   category  ceil  floor  c1 c2  c3  c4  c5   c6
0         6    10      2  67  51  76  80  41  95
1         2    20     15  21  23  53  13  95  31
2         3    30     25  37  25  55  30  21  63
3         4    40     35  93  79  95  24  61  97
4         1    50     45  81  96  86  70  33  35

I would like to change the values in c1,c2,...,c6 in each row according to the value in the category column the following way:

for i in c_i: if i < category: c_i=max(c_i, ceil)

for i in c_i: if i >= category: c_i=min(c_i, floor)

So the end result should look like this:

   category  ceil  floor  c1 c2  c3  c4  c5   c6
0         6    10      2  67  51  76  80  41   2
1         2    20     15  21  15  15  13  15  15
2         3    30     25  37  30  25  25  21  25
3         4    40     35  93  79  95  24  35  35
4         1    50     45  45  45  45  45  33  35

So far my solution is using a function and apply it to the rows of df, but that is very slow for large datasets. Is there a way to do vectorize this calculation?

CodePudding user response:

This is a bit tricky, but you can all achieve using masks.

For this you need to:

  • extract the Ci indices
  • compute a mask for cells where Ci < category (m1)
  • compute a mask for cells where Ci >= category (m2)
  • compute a mask for cells where the value is < floor (m3)
  • compute a mask for cells where the value is > ceil (m4)
  • use where and the above masks to apply the new values
idx = df.filter(regex='c\d').columns

cols = idx.str[1:].astype(int).values
rows = df['category'].values

m1 = cols<rows[:,None]   # using numpy broadcasting
m2 = cols>=rows[:,None]  #
m3 = df[idx].lt(df['floor'], axis=0)
m4 = df[idx].gt(df['ceil'], axis=0)

df[idx] = (df[idx].where(m1|m3, df['floor'], axis=0)
                  .where(m2|m4, df['ceil'], axis=0)
           )

output:

   category  ceil  floor  c1  c2  c3  c4  c5  c6
0         6    10      2  67  51  76  80  41   2
1         2    20     15  21  15  15  13  15  15
2         3    30     25  37  30  25  25  21  25
3         4    40     35  93  79  95  24  35  35
4         1    50     45  45  45  45  45  33  35
  •  Tags:  
  • Related