I have a dataframe like as below
df = pd.DataFrame({'text': ["Hi how","I am fine","Ila say Hi"],
'tokens':[['Hi','how'],['I','am','fine'],['Ila','say','Hi']],
'labels':[['A','B'],['C','B','A'],['D','B','A']]})
I would like to do the below
a) Filter the df using tokens AND labels column
b) Filter based on the values Hi, Ila for tokens column
c) Filter based on the values A and D for labels column
So, I tried the below
df[((df['tokens']==['Hi'])&(df['tokens']==['Ila']))&((df['labels']==['A'])&(df['labels']==['D']))]
However, this doesn't work. Since my column has values in list format, how do I filter them whether the list has only one item or multiple items?
I expect my output to be like as below
text tokens labels
Ila say Hi [Ila, say, Hi] [D, B, A]
CodePudding user response:
You could try the following:
df.loc[
df['tokens'].apply(lambda x: 'Hi' in x) &
df['tokens'].apply(lambda x: 'Ila' in x) &
df['labels'].apply(lambda x: 'A' in x) &
df['labels'].apply(lambda x: 'D' in x)
]
Output
text tokens labels
2 Ila say Hi [Ila, say, Hi] [D, B, A]
You could also cast to string and use:
df.loc[
df['tokens'].astype(str).str.contains('Hi') &
df['tokens'].astype(str).str.contains('Ila') &
df['labels'].astype(str).str.contains('A') &
df['labels'].astype(str).str.contains('D')
]
CodePudding user response:
Could you simply use a filter(lambda X: , df.columns) to get the columns you want, then just reindex the df?
