The values of my first column are going into the index but the column name is the first column outside the index, so I cannot use df.reset_index. For instance, my dataframe looks like this:
| CHA_NUMB | CHA_NAME | UN_CHA_ID | |
|---|---|---|---|
| 1 | m_3_1 | 12345 | lcha |
| 2 | t_1_2 | 12456 | lcha |
| 3 | blah | 90244 | lcha |
| 4 | blah | 23435 | lcha |
When it should look like this:
| CHA_NUMB | CHA_NAME | UN_CHA_ID | |
|---|---|---|---|
| 0 | 1 | m_3_1 | 12345 |
| 1 | 2 | t_1_2 | 12456 |
| 2 | 3 | blah | 90244 |
| 3 | 4 | blah | 23435 |
I tried resetting the index but it didn't work. Resetting the index makes the dataframe look like this:
| index | CHA_NUMB | CHA_NAME | UN_CHA_ID | |
|---|---|---|---|---|
| 0 | 0 | m_3_1 | 12345 | lcha |
| 1 | 1 | t_1_2 | 12456 | lcha |
| 2 | 2 | blah | 90244 | lcha |
| 3 | 3 | blah | 23435 | lcha |
CodePudding user response:
First use DataFrame.reset_index, then remove last column by indexing in DataFrame.iloc and last set columns names by original DataFrame by DataFrame.set_axis:
df = df.reset_index().iloc[:, :-1].set_axis(df.columns, axis=1)
print (df)
CHA_NUMB CHA_NAME UN_CHA_ID
0 1 m_3_1 12345
1 2 t_1_2 12456
2 3 blah 90244
3 4 blah 23435
Alternative:
cols = df.columns
df = df.reset_index().iloc[:, :-1]
df.columns = cols
EDIT: If first row of columns names not matched data you can omit columns names by header=None and skiprows=1, get columns names like RangeIndex, then use usecols for select first and third column and last set columns names by names parameter:
df = pd.read_csv(file,
header=None,
skiprows=1,
usecols=[0,2],
names=['CHA_NUMB','UN_CHA_ID'])
print (df)
CHA_NUMB UN_CHA_ID
0 1 12345
1 2 12456
2 3 90244
3 4 23435
