How to keep leading 0's and Zip5 format when converting df to csv?-CodePudding

I have a dataframe

dfio
    Name Age Zip 
    Good 22  03764
    Bad  22  98765

The df is read from an xlsx file using

dfio = pd.read_excel(r'//i-data-share/jobs/Member Lists/x/Output.xlsx', dtype=str,engine='openpyxl')

The zips within this df are in zip5 format. but when i convert my df to csv using

folder_to_export_path = "//i-data-share/jobs/Member Lists"
dfio.to_csv(folder_to_export_path 'Outputest.csv')

This runs successfully, but when i review the data both in the csv and as a notepad file the zip code for row 1 shows as '3764' instead of '03764'. How do I keep the format as zip 5 and mantain the leading 0's on conversion to csv?

CodePudding user response：

When you read the Excel file, pd.read_excel detects that the Zip column contains integers, and therefore it sets the dtype of that column in the resulting dataframe to int64. Since integers in Python don't have leading zeros...those get dropped.

To avoid this, specify dtype='object', instead of dtype=str (because strings are object in Python, not str), in your pd.read_excel call to have Pandas not interpret any dtypes:

dfio = pd.read_excel(r'//cfpet-data-share/jobs/Vet Member Lists/Results/Output.xlsx', dtype='object', engine='openpyxl')

CodePudding user response：

I overcame a similar problem by declaring a dict containing data types of certain columns, then including that dict in the command to read the file.

converter = dict()
converter['Zip'] = str
dfio = pd.read_excel(r'//i-data-share/jobs/Member Lists/x/Output.xlsx', converters=converter,engine='openpyxl')