I have a dataframe like the one below, just as an example of my dataframe (the list of tickers I have is very long). I would like to get a new dataframe where all the tickers which are accomplishing the 3 SOS signals 'Absorption Vol', 'Stopping Vol', 'Test in Rising' appear as a result.
data = {
'Ticker': ['SEDG', 'SEDG', 'SEDG', 'PEP', 'PEP', 'PEP', 'CTLT'],
'SOS': ['Absorption Vol', 'Stopping Vol', 'Test in Rising', 'Test in Rising', 'Absorption Vol', 'Shakeout', 'Test'],
'Date': ['2021-12-14', '2021-12-16', '2021-12-18', '2021-12-14', '2021-12-16', '2021-12-18', '2021-12-20']
}
df = pd.DataFrame(data)
print(df)
Output:
Ticker SOS Date
0 SEDG Absorption Vol 2021-12-14
1 SEDG Stopping Vol 2021-12-16
2 SEDG Test in Rising 2021-12-18
3 PEP Test in Rising 2021-12-14
4 PEP Absorption Vol 2021-12-16
5 PEP Shakeout 2021-12-18
6 CTLT Test 2021-12-20
The problem I have now is that I would like to filter the dataframe showing only the tickers with match in SOS column for 'Absorption Vol', 'Stopping Vol', 'Test in Rising'. I need the condition that these 3 SOS signals need to be present for each ticker in order to be shown as a result. If only one of the SOS signals is matched and not the rest, the ticker should not appear.
The result I want is something like this, but extended for the huge amount of tickers.
Ticker SOS Date
0 SEDG Absorption Vol 2021-12-14
1 SEDG Stopping Vol 2021-12-16
2 SEDG Test in Rising 2021-12-18
I am trying this code, but it's not showing the ticker when all these 3 SOS signals appear, but when one does:
SOS_values_filter = ['Test in Rising', 'Absorption Vol', 'Stopping Vol']
rslt = final_df_sos[final_df_sos['SOS'].isin(SOS_values_filter)].groupby('Ticker').filter(lambda x: len(x) >= 3)
print(rslt)
Also tried a for loop but my dataframe is not being fulfilled with all the ticker data.
SOS_values_filter = ['Test in Rising', 'Absorption Vol', 'Stopping Vol']
tickers = final_df_sos['Ticker']
df2 = pd.DataFrame()
for ticker in tickers:
df2 = final_df_sos.loc[final_df_sos[['Ticker', 'SOS']].isin([ticker, SOS_values_filter]).any(axis=1)]
df2 = df2.append({'Ticker': ticker, 'SOS': 'Long'}, ignore_index=True)
print(df2)
Any idea which is the best way to get this?
CodePudding user response:
.groupby('Ticker') first, then you can check if each group's SOS column contains the necessary values.
s = {'Test in Rising', 'Absorption Vol', 'Stopping Vol'}
df.groupby('Ticker').filter(lambda sf: s.issubset(sf['SOS']))
Output:
Ticker SOS Date
0 SEDG Absorption Vol 2021-12-14
1 SEDG Stopping Vol 2021-12-16
2 SEDG Test in Rising 2021-12-18
Here I'm using a set because the order doesn't seem to matter. And I'm using .issubset() because I assume it doesn't matter if other SOS values are present as long as the necessary three are present. You can customize this to your needs.
