I have a 6,000 column table that is loaded into a pandas DataFrame. The first column is an ID, the rest are numeric variables. All the columns are currently strings and I need to convert all but the first column to integer.
Many of the functions I've found don't allow passing a list of column names or drop the first column entirely.
CodePudding user response:
You can do:
df.astype({col: int for col in df.columns[1:]})
CodePudding user response:
An easy trick when you want to perform an operation on all columns but a few is to set the columns to ignore as index:
ignore = ['col1']
df = (df.set_index(ignore, append=True)
.astype(float)
.reset_index(ignore)
)
This should work with any operation even if it doesn't support specifying on which columns to work.
Example input:
df = pd.DataFrame({'col1': list('ABC'),
'col2': list('123'),
'col3': list('456'),
})
output:
>>> df.dtypes
col1 object
col2 float64
col3 float64
dtype: object
CodePudding user response:
Try something like:
df.loc[:, df.columns != 'ID'].astype(int)
