Home > database >  Which is the best way to filter data in two columns in a dataframe
Which is the best way to filter data in two columns in a dataframe

Time:01-12

I have a dataframe like the one below, just as an example of my dataframe (the list of tickers I have is very long). I would like to get a new dataframe where all the tickers which are accomplishing the 3 SOS signals 'Absorption Vol', 'Stopping Vol', 'Test in Rising' appear as a result.

data = {
    'Ticker': ['SEDG', 'SEDG', 'SEDG', 'PEP', 'PEP', 'PEP', 'CTLT'],
    'SOS': ['Absorption Vol', 'Stopping Vol', 'Test in Rising', 'Test in Rising', 'Absorption Vol', 'Shakeout', 'Test'],
    'Date': ['2021-12-14', '2021-12-16', '2021-12-18', '2021-12-14', '2021-12-16', '2021-12-18', '2021-12-20']
}
df = pd.DataFrame(data)
print(df)

Output:

  Ticker             SOS        Date
0   SEDG  Absorption Vol  2021-12-14
1   SEDG    Stopping Vol  2021-12-16
2   SEDG  Test in Rising  2021-12-18
3    PEP  Test in Rising  2021-12-14
4    PEP  Absorption Vol  2021-12-16
5    PEP        Shakeout  2021-12-18
6   CTLT            Test  2021-12-20

The problem I have now is that I would like to filter the dataframe showing only the tickers with match in SOS column for 'Absorption Vol', 'Stopping Vol', 'Test in Rising'. I need the condition that these 3 SOS signals need to be present for each ticker in order to be shown as a result. If only one of the SOS signals is matched and not the rest, the ticker should not appear.

The result I want is something like this, but extended for the huge amount of tickers.

  Ticker             SOS        Date
0   SEDG  Absorption Vol  2021-12-14
1   SEDG    Stopping Vol  2021-12-16
2   SEDG  Test in Rising  2021-12-18

I am trying this code, but it's not showing the ticker when all these 3 SOS signals appear, but when one does:

SOS_values_filter = ['Test in Rising', 'Absorption Vol', 'Stopping Vol']
rslt = final_df_sos[final_df_sos['SOS'].isin(SOS_values_filter)].groupby('Ticker').filter(lambda x: len(x) >= 3)
print(rslt)

Also tried a for loop but my dataframe is not being fulfilled with all the ticker data.

SOS_values_filter = ['Test in Rising', 'Absorption Vol', 'Stopping Vol']
tickers = final_df_sos['Ticker']
df2 = pd.DataFrame()
for ticker in tickers:
    
    df2 = final_df_sos.loc[final_df_sos[['Ticker', 'SOS']].isin([ticker, SOS_values_filter]).any(axis=1)]
    df2 = df2.append({'Ticker': ticker, 'SOS': 'Long'}, ignore_index=True)

print(df2)

Any idea which is the best way to get this?

CodePudding user response:

.groupby('Ticker') first, then you can check if each group's SOS column contains the necessary values.

s = {'Test in Rising', 'Absorption Vol', 'Stopping Vol'}
df.groupby('Ticker').filter(lambda sf: s.issubset(sf['SOS']))

Output:

  Ticker             SOS        Date
0   SEDG  Absorption Vol  2021-12-14
1   SEDG    Stopping Vol  2021-12-16
2   SEDG  Test in Rising  2021-12-18

Here I'm using a set because the order doesn't seem to matter. And I'm using .issubset() because I assume it doesn't matter if other SOS values are present as long as the necessary three are present. You can customize this to your needs.

  •  Tags:  
  • Related