Suppose my data frame df has the following column names: ['date', 'value', '20211010', '20211017', '20211024', ...]
I want to rename the column names of '20211010', '20211017', '20211024', ... (that is, all the columns starting from 20211010) to t1, t2, t3, ... continue to increase.
The expected new column names will be ['date', 'value', 't1', 't2', 't3', ...].
How to achieve this in Pandas? Thanks.
Reference:
how do i rename the columns from unnamed:0 to columns with increment number in python
CodePudding user response:
IIUC, a robust method could be to use pandas.to_datetime and pandas.factorize:
idx, _ = pd.factorize(pd.to_datetime(df.columns, format='%Y%m%d',
errors='coerce'),
sort=True)
df.columns = ('t' pd.Series(idx 1, dtype='str')).mask(idx<0, df.columns)
Example output:
Index(['date', 'value', 't1', 't2', 't4', 'other', 't3'], dtype='object')
Input columns:
Index(['date', 'value', '20211010', '20211017', '20211024', 'other',
'20211018'],
dtype='object')
robustness
to_datetime ensures that valid dates are used, and sort=True in factorize enables to keep the dates sorted.
Example on this input:
['X', '20211010', '20229999', '20211018', '20211024', 'Y', '20211001']
The output would be:
['X', 't2', '20229999', 't3', 't4', 'Y', 't1']
The invalid date is ignored and the tn are in order.
CodePudding user response:
If always the first two columns will be skipped and just rename the rest a simple way would be
for i, col in enumerate(df.columns[2:]):
df.rename(columns={col:f't{i 1}'}, inplace = True)
But this will not consider the name of any column, so if somehow you have a column in the middle that doesn't need to be renamed it will do it anyway
If you need to do it bulletproof I would go with the answer from mozway
