I have a dataset in Excel and some columns are unnamed there: A_ColumnWithName, Unnamed1, Unnamed2, B_ColumnWithName, Unnamed3, Unnamed4
I need to set the names of the currently unnamed columns to the same as the 1st column name from the left, so my columns should look like this:
A_ColumnWithName, A_ColumnWithName, A_ColumnWithName, B_ColumnWithName, B_ColumnWithName, B_ColumnWithName
Any hints how can I do it using Python? An important thing is that there is tons of such columns that's why it's required to do in the most automatic way possible.
CodePudding user response:
You can convert the index to Series and use it to mask the names that contain "Unnamed" and ffill the previous valid name:
cols = df.columns.to_series()
df.columns = cols.mask(cols.str.contains('Unnamed')).ffill()
Note however that having duplicated columns names is not encouraged
example input:
A Unnamed1 Unnamed2 B Unnamed3 C
0 x x x x x x
output:
A A A B B C
0 x x x x x x
CodePudding user response:
The following code would work.
import pandas as pd
df = pd.DataFrame( columns= ["A_ColumnWithName", "unnamed","unnamed", "B_ColumnWithName", "unnamed", "unnamed"])
replaceWith = df.columns.values[0]
for i in range(1, len(df.columns)):
if df.columns[i] == 'unnamed':
df.columns.values[i] = replaceWith
else:
replaceWith = df.columns.values[i]
