I have the following dataframe:
import pandas as pd
data=[[ 6, 10, 2, 67, 51, 76, 80, 41, 95],
[ 2, 20, 15, 21, 23, 53, 13, 95, 31],
[ 3, 30, 25, 37, 25, 55, 30, 21, 63],
[ 4, 40, 35, 93, 79, 95, 24, 61, 97],
[ 1, 50, 45, 81, 96, 86, 70, 33, 35]]
df=pd.DataFrame(data,columns=['category', 'ceil', 'floor','c1','c2','c3','c4','c5','c6'])
category ceil floor c1 c2 c3 c4 c5 c6
0 6 10 2 67 51 76 80 41 95
1 2 20 15 21 23 53 13 95 31
2 3 30 25 37 25 55 30 21 63
3 4 40 35 93 79 95 24 61 97
4 1 50 45 81 96 86 70 33 35
I would like to change the values in c1,c2,...,c6 in each row according to the value in the category column the following way:
for i in c_i: if i < category: c_i=max(c_i, ceil)
for i in c_i: if i >= category: c_i=min(c_i, floor)
So the end result should look like this:
category ceil floor c1 c2 c3 c4 c5 c6
0 6 10 2 67 51 76 80 41 2
1 2 20 15 21 15 15 13 15 15
2 3 30 25 37 30 25 25 21 25
3 4 40 35 93 79 95 24 35 35
4 1 50 45 45 45 45 45 33 35
So far my solution is using a function and apply it to the rows of df, but that is very slow for large datasets.
Is there a way to do vectorize this calculation?
CodePudding user response:
This is a bit tricky, but you can all achieve using masks.
For this you need to:
- extract the Ci indices
- compute a mask for cells where Ci < category (m1)
- compute a mask for cells where Ci >= category (m2)
- compute a mask for cells where the value is < floor (m3)
- compute a mask for cells where the value is > ceil (m4)
- use
whereand the above masks to apply the new values
idx = df.filter(regex='c\d').columns
cols = idx.str[1:].astype(int).values
rows = df['category'].values
m1 = cols<rows[:,None] # using numpy broadcasting
m2 = cols>=rows[:,None] #
m3 = df[idx].lt(df['floor'], axis=0)
m4 = df[idx].gt(df['ceil'], axis=0)
df[idx] = (df[idx].where(m1|m3, df['floor'], axis=0)
.where(m2|m4, df['ceil'], axis=0)
)
output:
category ceil floor c1 c2 c3 c4 c5 c6
0 6 10 2 67 51 76 80 41 2
1 2 20 15 21 15 15 13 15 15
2 3 30 25 37 30 25 25 21 25
3 4 40 35 93 79 95 24 35 35
4 1 50 45 45 45 45 45 33 35
