I have a dataframe
dfio
Name Age Zip
Good 22 03764
Bad 22 98765
The df is read from an xlsx file using
dfio = pd.read_excel(r'//i-data-share/jobs/Member Lists/x/Output.xlsx', dtype=str,engine='openpyxl')
The zips within this df are in zip5 format. but when i convert my df to csv using
folder_to_export_path = "//i-data-share/jobs/Member Lists"
dfio.to_csv(folder_to_export_path 'Outputest.csv')
This runs successfully, but when i review the data both in the csv and as a notepad file the zip code for row 1 shows as '3764' instead of '03764'. How do I keep the format as zip 5 and mantain the leading 0's on conversion to csv?
CodePudding user response:
When you read the Excel file, pd.read_excel detects that the Zip column contains integers, and therefore it sets the dtype of that column in the resulting dataframe to int64. Since integers in Python don't have leading zeros...those get dropped.
To avoid this, specify dtype='object', instead of dtype=str (because strings are object in Python, not str), in your pd.read_excel call to have Pandas not interpret any dtypes:
dfio = pd.read_excel(r'//cfpet-data-share/jobs/Vet Member Lists/Results/Output.xlsx', dtype='object', engine='openpyxl')
CodePudding user response:
I overcame a similar problem by declaring a dict containing data types of certain columns, then including that dict in the command to read the file.
converter = dict()
converter['Zip'] = str
dfio = pd.read_excel(r'//i-data-share/jobs/Member Lists/x/Output.xlsx', converters=converter,engine='openpyxl')
