I have a large dataframe df:
Col1 Col2 Col3 Val1 Val2
A1 B1 c1 0.2 -0.3
A1 B1 c2 -0.3 0.3
A1 B1 c3 0.5 0.2
A2 B2 c1 -0.3 0.1
A2 B2 c2 0.7 -0.3
A3 B3 c1 -0.3 0.3
A3 B3 c2 -0.2 0.3
A3 B3 c3 0.5 0.2
A3 B3 c4 0.8 0.7
Where there is alternating pattern in each group of Col1-Col2 for sign of Val1 and Val2 i.e. a pair where Val1 is positive and Val2 is negative and vice versa. I want to achieve the following:
Col1 Col2 Col3 Val1 Val2 Pattern
A1 B1 c1 0.2 -0.3 Y
A1 B1 c2 -0.3 0.3 Y
A1 B1 c3 0.5 0.2 Y
A2 B2 c1 -0.3 0.1 Y
A2 B2 c2 0.7 -0.3 Y
A3 B3 c1 -0.3 0.3 N
A3 B3 c2 -0.2 0.3 N
A3 B3 c3 -0.5 -0.2 N
A3 B3 c4 0.8 0.7 N
A1-B1 and A2-B2 has a pair of Val1 and Val2 with opposite sign but A3-B3 has none.
I am not sure how to proceed with the above considering dataframe is large.
Edit:
Reason A1-B1 is 'Y' is cause there is (0.2, -0.3) AND (-0.3, 0.3)
A2-B2 has (-0.3, 0.1) AND (0.7, -0.3)
A3-B3 doesn't have 2 such sets. It has only likes of (-0.3, 0.3) and nothing with Val1, Val2 of kind (positive, negative).
i.e. to classify as pattern it must have (positive, negative) and (negative, positive)
CodePudding user response:
Use np.sign checking with DataFrame.eq and Groupby.transform
signs = np.sign(df[['Val1', 'Val2']])
m1 = signs.eq([1,-1]).all(axis=1)
m2 = signs.eq([-1,1]).all(axis=1)
df['Pattern'] = pd.concat([m1, m2], axis=1)\
.groupby([df['Col1'], df['Col2']])\
.transform('any').all(axis=1)\
.map({True:'Y', False: 'N'})
print(df)
Col1 Col2 Col3 Val1 Val2 Pattern
0 A1 B1 c1 0.2 -0.3 Y
1 A1 B1 c2 -0.3 0.3 Y
2 A1 B1 c3 0.5 0.2 Y
3 A2 B2 c1 -0.3 0.1 Y
4 A2 B2 c2 0.7 -0.3 Y
5 A3 B3 c1 -0.3 0.3 N
6 A3 B3 c2 -0.2 0.3 N
7 A3 B3 c3 0.5 0.2 N
8 A3 B3 c4 0.8 0.7 N
CodePudding user response:
You can groupby Col1 and Col2 column then use np.sign to check the sign of value in Series. Then subtract the sign of Val1 and Val2. If the sign of two numbers are opposite, the result would be 2 or -2
out = (df.groupby(['Col1', 'Col2'])
.apply(lambda g: 'Y'
if {2, -2}.issubset(set(np.sign(g['Val1']).sub(np.sign(g['Val2'])).unique()))
else 'N')
.to_frame('Pattern').reset_index())
print(out)
Col1 Col2 Pattern
0 A1 B1 Y
1 A2 B2 Y
2 A3 B3 N
At last, merge the result to original dataframe
df['Pattern'] = df.merge(out, on=['Col1', 'Col2'], how='left')['Pattern']
print(df)
Col1 Col2 Col3 Val1 Val2 Pattern
0 A1 B1 c1 0.2 -0.3 Y
1 A1 B1 c2 -0.3 0.3 Y
2 A1 B1 c3 0.5 0.2 Y
3 A2 B2 c1 -0.3 0.1 Y
4 A2 B2 c2 0.7 -0.3 Y
5 A3 B3 c1 -0.3 0.3 N
6 A3 B3 c2 -0.2 0.3 N
7 A3 B3 c3 0.5 0.2 N
8 A3 B3 c4 0.8 0.7 N
CodePudding user response:
You can compute boolean masks and combine them per group, then per row:
m1 = df['Val1'].lt(0) # Val1 negative
m2 = df['Val2'].lt(0) # Val2 negative
mask = (pd.concat([m1&~m2, # Val1 negative and Val2 positive
~m1&m2], # Val1 positive and Val2 negative
axis=1)
.groupby([df['Col1'], df['Col2']])
.transform('any') # is there at least one match per group?
.all(1) # were there both True for above?
)
df['Pattern'] = np.where(mask, 'Y', 'N')
output:
Col1 Col2 Col3 Val1 Val2 Pattern
0 A1 B1 c1 0.2 -0.3 Y
1 A1 B1 c2 -0.3 0.3 Y
2 A1 B1 c3 0.5 0.2 Y
3 A2 B2 c1 -0.3 0.1 Y
4 A2 B2 c2 0.7 -0.3 Y
5 A3 B3 c1 -0.3 0.3 N
6 A3 B3 c2 -0.2 0.3 N
7 A3 B3 c3 0.5 0.2 N
8 A3 B3 c4 0.8 0.7 N
intermediates:
Col1 Col2 Col3 Val1 Val2 Pattern m1&~m2 ~m1&m2 any(m1&~m2) any(~m1&m2) mask
0 A1 B1 c1 0.2 -0.3 Y False True True True True
1 A1 B1 c2 -0.3 0.3 Y True False True True True
2 A1 B1 c3 0.5 0.2 Y False False True True True
3 A2 B2 c1 -0.3 0.1 Y True False True True True
4 A2 B2 c2 0.7 -0.3 Y False True True True True
5 A3 B3 c1 -0.3 0.3 N True False True False False
6 A3 B3 c2 -0.2 0.3 N True False True False False
7 A3 B3 c3 0.5 0.2 N False False True False False
8 A3 B3 c4 0.8 0.7 N False False True False False
