Remove columns which match some pattern in python-CodePudding

I have a large csv files which have several columns as follows:

M_15_19_yr_	M_19_25_yr_	M_25_35_yr_
20	34	12
09	21	19

I want to remove such columns which start from M_{age1}_{age2}_yr. I tried using:

df = df.loc[:, ~df.columns.str.startswith(('M_15_19_yr_','M_19_25_yr_','M_25_35_yr_'))

However, I have many such columns. How do I remove all of such columns without explicitly writing down each column's name?

CodePudding user response：

You may check with filter

df = df.filter(regex = r'^(?!M_\d _\d _yr)')

CodePudding user response：

You may instead use str.contains along with a regex pattern:

df = df.loc[:, ~df.columns.str.contains(r'^M_\d _\d _yr$', regex=True))

A more general pattern which includes the new case given in your comment below would be:

df = df.loc[:, ~df.columns.str.contains(r'^\w _(?:\w _)*\d _\d _yr$', regex=True))