Home > database >  How do i count values in multiple columns based on multiple criteria and create a new column row wis
How do i count values in multiple columns based on multiple criteria and create a new column row wis

Time:01-17

index key M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 Average Count_G10 Count_L10
0 a 12 0 159 0 20 49 0 131 157 153 68.1 4 3
1 b 0 68 195 189 0 79 12 179 21 62 80.5 3 4
2 c 0 139 0 188 12 0 31 87 152 73 68.2 4 2
3 d 126 156 0 112 178 146 0 19 192 25 95.4 6 2
4 e 109 0 172 0 0 0 44 145 186 100 75.6 5 1
5 f 63 183 194 183 0 163 136 13 163 162 126 6 2
6 g 101 143 0 184 0 107 103 0 60 133 83.1 6 1
7 h 13 101 139 86 101 72 93 151 0 0 75.6 6 1
8 i 182 71 73 73 129 32 56 135 0 114 86.5 4 5
9 j 82 0 198 0 117 21 0 32 64 146 66 4 2
10 k 145 0 194 0 156 71 0 89 57 31 74.3 4 2

I would like to get the columns count_G10 and count_L10 where the logic for count_G10 is as follows: count of months(M1 to M10 columns) where value is >0 and ((value-average)/average) > 0.1

Similarly, count_L10 logic is: count of months(M1 to M10 columns) where value is >0 and ((value-average)/average) < -0.1

I have tried the following in Pandas:

Oct20_Nov21 = [202010,202011,202012,202101,202102,202103,202104,202105,202106,202107,202108,202109,202110,202111] 

new_df3['G10%'] = new_df3[Oct20_Nov21].applymap(lambda x : 1 if (((x-new_df3.avg_w_s)/new_df3.avg_w_s) > 0.1).any() else 0).values.sum(axis=1)

I get the following error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Let me know what am I missing here. Thank you

CodePudding user response:

Don't use applymap as it is both very slow and will try to perform operations on scalar values and Series leading to the error shown.

Instead it is best to perform the operations at the DataFrame level.

In the most verbose way we could do:

m_df = df.filter(like='M')

df['Count_G10'] = (
        m_df.gt(0) &
        m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
).sum(axis=1)

df['Count_L10'] = (
        m_df.gt(0) &
        m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).lt(-0.1)
).sum(axis=1)
key M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 Average Count_G10 Count_L10
a 12 0 159 0 20 49 0 131 157 153 68.1 4 3
b 0 68 195 189 0 79 12 179 21 62 80.5 3 4
c 0 139 0 188 12 0 31 87 152 73 68.2 4 2
d 126 156 0 112 178 146 0 19 192 25 95.4 6 2
e 109 0 172 0 0 0 44 145 186 100 75.6 5 1
f 63 183 194 183 0 163 136 13 163 162 126 6 2
g 101 143 0 184 0 107 103 0 60 133 83.1 6 1
h 13 101 139 86 101 72 93 151 0 0 75.6 6 1
i 182 71 73 73 129 32 56 135 0 114 86.5 4 5
j 82 0 198 0 117 21 0 32 64 146 66 4 2
k 145 0 194 0 156 71 0 89 57 31 74.3 4 2

First filter to select only the M columns (There are many other ways to select only the desired columns which would also work):

m_df = df.filter(like='M')

     M1   M2   M3   M4   M5   M6   M7   M8   M9  M10
0    12    0  159    0   20   49    0  131  157  153
1     0   68  195  189    0   79   12  179   21   62
2     0  139    0  188   12    0   31   87  152   73
3   126  156    0  112  178  146    0   19  192   25
4   109    0  172    0    0    0   44  145  186  100
5    63  183  194  183    0  163  136   13  163  162
6   101  143    0  184    0  107  103    0   60  133
7    13  101  139   86  101   72   93  151    0    0
8   182   71   73   73  129   32   56  135    0  114
9    82    0  198    0  117   21    0   32   64  146
10  145    0  194    0  156   71    0   89   57   31

Then use the comparison operations gt and lt to do the value comparison operations.

Step 1: check is for values gt 0:

m_df.gt(0)

       M1     M2     M3     M4     M5     M6     M7     M8     M9    M10
0    True  False   True  False   True   True  False   True   True   True
1   False   True   True   True  False   True   True   True   True   True
2   False   True  False   True   True  False   True   True   True   True
3    True   True  False   True   True   True  False   True   True   True
4    True  False   True  False  False  False   True   True   True   True
5    True   True   True   True  False   True   True   True   True   True
6    True   True  False   True  False   True   True  False   True   True
7    True   True   True   True   True   True   True   True  False  False
8    True   True   True   True   True   True   True   True  False   True
9    True  False   True  False   True   True  False   True   True   True
10   True  False   True  False   True   True  False   True   True   True

Evaluate: ((value - average) / average). Both operations need to align on axis=0.

Step 2: Subtract

m_df.sub(df['Average'], axis=0)

      M1    M2     M3     M4     M5    M6    M7     M8     M9   M10
0  -56.1 -68.1   90.9  -68.1  -48.1 -19.1 -68.1   62.9   88.9  84.9
1  -80.5 -12.5  114.5  108.5  -80.5  -1.5 -68.5   98.5  -59.5 -18.5
2  -68.2  70.8  -68.2  119.8  -56.2 -68.2 -37.2   18.8   83.8   4.8
3   30.6  60.6  -95.4   16.6   82.6  50.6 -95.4  -76.4   96.6 -70.4
4   33.4 -75.6   96.4  -75.6  -75.6 -75.6 -31.6   69.4  110.4  24.4
5  -63.0  57.0   68.0   57.0 -126.0  37.0  10.0 -113.0   37.0  36.0
6   17.9  59.9  -83.1  100.9  -83.1  23.9  19.9  -83.1  -23.1  49.9
7  -62.6  25.4   63.4   10.4   25.4  -3.6  17.4   75.4  -75.6 -75.6
8   95.5 -15.5  -13.5  -13.5   42.5 -54.5 -30.5   48.5  -86.5  27.5
9   16.0 -66.0  132.0  -66.0   51.0 -45.0 -66.0  -34.0   -2.0  80.0
10  70.7 -74.3  119.7  -74.3   81.7  -3.3 -74.3   14.7  -17.3 -43.3

Step 3: Divide

m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0)

          M1        M2        M3        M4        M5        M6        M7        M8        M9       M10
0  -0.823789 -1.000000  1.334802 -1.000000 -0.706314 -0.280470 -1.000000  0.923642  1.305433  1.246696
1  -1.000000 -0.155280  1.422360  1.347826 -1.000000 -0.018634 -0.850932  1.223602 -0.739130 -0.229814
2  -1.000000  1.038123 -1.000000  1.756598 -0.824047 -1.000000 -0.545455  0.275660  1.228739  0.070381
3   0.320755  0.635220 -1.000000  0.174004  0.865828  0.530398 -1.000000 -0.800839  1.012579 -0.737945
4   0.441799 -1.000000  1.275132 -1.000000 -1.000000 -1.000000 -0.417989  0.917989  1.460317  0.322751
5  -0.500000  0.452381  0.539683  0.452381 -1.000000  0.293651  0.079365 -0.896825  0.293651  0.285714
6   0.215403  0.720818 -1.000000  1.214200 -1.000000  0.287605  0.239471 -1.000000 -0.277978  0.600481
7  -0.828042  0.335979  0.838624  0.137566  0.335979 -0.047619  0.230159  0.997354 -1.000000 -1.000000
8   1.104046 -0.179191 -0.156069 -0.156069  0.491329 -0.630058 -0.352601  0.560694 -1.000000  0.317919
9   0.242424 -1.000000  2.000000 -1.000000  0.772727 -0.681818 -1.000000 -0.515152 -0.030303  1.212121
10  0.951548 -1.000000  1.611036 -1.000000  1.099596 -0.044415 -1.000000  0.197847 -0.232840 -0.582773

Step 4: Compare (gt or lt depending)

m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)

       M1     M2     M3     M4     M5     M6     M7     M8     M9    M10
0   False  False   True  False  False  False  False   True   True   True
1   False  False   True   True  False  False  False   True  False  False
2   False   True  False   True  False  False  False   True   True  False
3    True   True  False   True   True   True  False  False   True  False
4    True  False   True  False  False  False  False   True   True   True
5   False   True   True   True  False   True  False  False   True   True
6    True   True  False   True  False   True   True  False  False   True
7   False   True   True   True   True  False   True   True  False  False
8    True  False  False  False   True  False  False   True  False   True
9    True  False   True  False   True  False  False  False  False   True
10   True  False   True  False   True  False  False   True  False  False

Step 5: Find where both conditions are True with logical AND (&)

(
        m_df.gt(0) &
        m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
)

       M1     M2     M3     M4     M5     M6     M7     M8     M9    M10
0   False  False   True  False  False  False  False   True   True   True
1   False  False   True   True  False  False  False   True  False  False
2   False   True  False   True  False  False  False   True   True  False
3    True   True  False   True   True   True  False  False   True  False
4    True  False   True  False  False  False  False   True   True   True
5   False   True   True   True  False   True  False  False   True   True
6    True   True  False   True  False   True   True  False  False   True
7   False   True   True   True   True  False   True   True  False  False
8    True  False  False  False   True  False  False   True  False   True
9    True  False   True  False   True  False  False  False  False   True
10   True  False   True  False   True  False  False   True  False  False

Step 6: Count the number of True values in each row with sum (True is 1 and False is 0 (the additive identity) which is why sum works to count number of True values)

(
        m_df.gt(0) &
        m_df.sub(df['Average'], axis=0).div(df['Average'], axis=0).gt(0.1)
).sum(axis=1)

0     4
1     3
2     4
3     6
4     5
5     6
6     6
7     6
8     4
9     4
10    4
dtype: int64

An almost identical process occurs for Count_L10 the only difference is checking .lt(-0.1) instead of .gt(0.1).


This operation can be greatly simplified by extracting and reusing common operations and refactoring the expression:

m_df = df.filter(like='M')
# Shared Condition
m = m_df.gt(0)
# Values
v = m_df.div(df['Average'], axis=0) - 1

df['Count_G10'] = (m & v.gt(0.1)).sum(axis=1)
df['Count_L10'] = (m & v.lt(-0.1)).sum(axis=1)

Both conditions use the check for values greater than 0 so we can keep that in a variable (m) to use multiple times. Both expressions compare against the same expression ((value - average) / average) we can also store this in a variable v.

The expression ((value - average) / average) can also be simplified to just ((value / average) - 1).

Since ((v - a) / a) = ((v/a) - (a / a)) = ((v/a) - 1)

This will reduce overall computation time, at the expense of some readability, but produces the same results:

key M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 Average Count_G10 Count_L10
a 12 0 159 0 20 49 0 131 157 153 68.1 4 3
b 0 68 195 189 0 79 12 179 21 62 80.5 3 4
c 0 139 0 188 12 0 31 87 152 73 68.2 4 2
d 126 156 0 112 178 146 0 19 192 25 95.4 6 2
e 109 0 172 0 0 0 44 145 186 100 75.6 5 1
f 63 183 194 183 0 163 136 13 163 162 126 6 2
g 101 143 0 184 0 107 103 0 60 133 83.1 6 1
h 13 101 139 86 101 72 93 151 0 0 75.6 6 1
i 182 71 73 73 129 32 56 135 0 114 86.5 4 5
j 82 0 198 0 117 21 0 32 64 146 66 4 2
k 145 0 194 0 156 71 0 89 57 31 74.3 4 2

Setup used:

import pandas as pd

df = pd.DataFrame({
    'key': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'],
    'M1': [12, 0, 0, 126, 109, 63, 101, 13, 182, 82, 145],
    'M2': [0, 68, 139, 156, 0, 183, 143, 101, 71, 0, 0],
    'M3': [159, 195, 0, 0, 172, 194, 0, 139, 73, 198, 194],
    'M4': [0, 189, 188, 112, 0, 183, 184, 86, 73, 0, 0],
    'M5': [20, 0, 12, 178, 0, 0, 0, 101, 129, 117, 156],
    'M6': [49, 79, 0, 146, 0, 163, 107, 72, 32, 21, 71],
    'M7': [0, 12, 31, 0, 44, 136, 103, 93, 56, 0, 0],
    'M8': [131, 179, 87, 19, 145, 13, 0, 151, 135, 32, 89],
    'M9': [157, 21, 152, 192, 186, 163, 60, 0, 0, 64, 57],
    'M10': [153, 62, 73, 25, 100, 162, 133, 0, 114, 146, 31],
    'Average': [68.1, 80.5, 68.2, 95.4, 75.6, 126.0, 83.1, 75.6, 86.5, 66.0,
                74.3]
})
  •  Tags:  
  • Related