I have a large csv files which have several columns as follows:
| M_15_19_yr_ | M_19_25_yr_ | M_25_35_yr_ |
|---|---|---|
| 20 | 34 | 12 |
| 09 | 21 | 19 |
I want to remove such columns which start from M_{age1}_{age2}_yr. I tried using:
df = df.loc[:, ~df.columns.str.startswith(('M_15_19_yr_','M_19_25_yr_','M_25_35_yr_'))
However, I have many such columns. How do I remove all of such columns without explicitly writing down each column's name?
CodePudding user response:
You may check with filter
df = df.filter(regex = r'^(?!M_\d _\d _yr)')
CodePudding user response:
You may instead use str.contains along with a regex pattern:
df = df.loc[:, ~df.columns.str.contains(r'^M_\d _\d _yr$', regex=True))
A more general pattern which includes the new case given in your comment below would be:
df = df.loc[:, ~df.columns.str.contains(r'^\w _(?:\w _)*\d _\d _yr$', regex=True))
