filter pandas columns by list of substrings-CodePudding

I have a pandas dataframe with 100 columns.

df.columns = ['casette', 'a true', 'b true', 'blah1', 'blah2', ..... 'a card', 'b card']

is there a better way to filter columns based on a list substrings

eg. something along the lines of:

df = df[[x for x in df.columns if any(['true', 'cassette', 'card'] not in x)]]

instead of:

df = df[[x for x in df.columns if 'true' not in x and 'cassette' not in x and 'card' not in x]]

to get:

df.columns = ['blah1', 'blah2', .....]

any tips would be appreciated cheers

CodePudding user response：

You should use filter. Something like:

target_list = ['true', 'cassete', 'card']
df.filter(regex="|".join(target_list), axis=1)

CodePudding user response：

You can write it using any()

>>> [ x for x in df.columns if not any(s in x for s in ['true', 'cassette', 'card']) ]
['blah1', 'blah2']

or all()

>>> [ x for x in df.columns if all(s not in x for s in ['true', 'cassette', 'card']) ]
['blah1', 'blah2']

However - you don't save much over your original effort.

You can .filter() the columns by regex

>>> df.filter(regex='^(?:(?!cassette|true|card).) $').columns
Index(['blah1', 'blah2'], dtype='object')

CodePudding user response：

You can also use str.contains to filter out column names:

unwanted_words = ['true', 'cassette', 'card']
out = df.columns[~df.columns.str.contains('|'.join(unwanted_words))]

Output:

Index(['blah1', 'blah2'], dtype='object')