I want to get a sub-dataframe that contains all elements in a list. Let's take the DataFrame as an example.
my_dict = {
'Job': ['Painting', 'Capentry', 'Teacher', 'Farming'],
'Job_Detail': ['all sort of painting',
'kitchen utensils, all types of roofing etc.',\
'skill and practical oriented teaching',\
'all agricultural practices']
}
df = pd.DataFrame(my_dict)
Output looks thus:
Job Job_Detail
0 Painting all sort of painting
1 Capentry kitchen utensils, all types of roofing etc.
2 Teacher skill and practical oriented teaching
3 Farming all agricultural practices
my_lst = ['of', 'all']
I want to filter df with mylst to get a sub_DataFrame that looks like this:
Job Job_Detail
0 Painting all sort of painting
1 Capentry kitchen utensils, all types of roofing etc.
I've tried df[df.Job_Detail.isin(['of', 'all']) but it returns an empty DataFrame.
CodePudding user response:
I'm no pandas expert, but the best function to use here seems to be str.contains
From the docs:
Series.str.contains(pat, case=True, flags=0, na=None, regex=True)
Test if pattern or regex is contained within a string of a Series or Index.
Return boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index.
Edit: This masks using or, not and
import pandas as pd
my_dict = {
'Job': ['Painting', 'Capentry', 'Teacher', 'Farming'],
'Job_Detail': ['all sort of painting',
'kitchen utensils, all types of roofing etc.',
'skill and practical oriented teaching',
'all agricultural practices']
}
my_lst = ['of', 'all']
df = pd.DataFrame(my_dict)
print(df)
mask = df.Job_Detail.str.contains('|'.join(my_lst), regex=True)
print(df[mask])
Here's a solution that masks uing and:
import pandas as pd
my_dict = {
'Job': ['Painting', 'Capentry', 'Teacher', 'Farming'],
'Job_Detail': ['all sort of painting',
'kitchen utensils, all types of roofing etc.',
'skill and practical oriented teaching',
'all agricultural practices']
}
my_lst = ['of', 'all']
df = pd.DataFrame(my_dict)
print(df)
print("------")
masks = [df.Job_Detail.str.contains(word) for word in my_lst]
mask = pd.concat(masks, axis=1).all(axis=1)
print(df[mask])
CodePudding user response:
@Lone Your code answered a different question, but it helped me arrive at the answer. Thank you, appreciated.
Here's the closest to what I needed:
df[(df.Job_Detail.str.contains('of')) & (df.Job_Detail.str.contains('all'))]
