Home > OS >  How to build a loop for converting entires of categorical columns to numerical values in Pandas?
How to build a loop for converting entires of categorical columns to numerical values in Pandas?

Time:01-20

I have a Pandas data frame with several columns, with some columns comprising categorical entries. I am 'manually' converting these entries to numerical values. For example,

df['gender'] = pd.Series(df['gender'].factorize()[0])
df['race'] = pd.Series(df['race'].factorize()[0])
df['city'] = pd.Series(df['city'].factorize()[0])
df['state'] = pd.Series(df['state'].factorize()[0])

If the number of columns is huge, this method is obviously inefficient. Is there a way to do this by constructing a loop over all columns (only those columns with categorical entries)?

CodePudding user response:

Use DataFrame.apply by columns in variable cols:

cols = df.select_dtypes(['category']).columns
df[cols] = df[cols].apply(lambda x: x.factorize()[0])

EDIT:

Your solution should be simplify:

for column in df.select_dtypes(['category']):
    df[column] = df[column].factorize()[0]

CodePudding user response:

I tried the following, which seems to work fine:

for column in df.select_dtypes(['category']):
    df[column] = pd.Series(df[column].factorize()[0])

where 'category' could be 'bool', 'object', etc.

  •  Tags:  
  • Related