Home > Back-end >  reading Excel in Python with win32com - it's incorrectly reading header as first row of data
reading Excel in Python with win32com - it's incorrectly reading header as first row of data

Time:01-07

I need to read a password-protected Excel from a Python script. I found a solution on Reddit that works to open the Excel (I have the password) and save to a dataframe.

def open_to_read(filename, password):
    excel = win32.Dispatch('Excel.Application')
    wb = excel.Workbooks.Open(filename, False, False, None, Password=password)
    excel.DisplayAlerts = False
    sheet = wb.Sheets(all_txns)
    df = pd.DataFrame(sheet.UsedRange())
    wb.SaveAs(filename, None, '', '')
    excel.DisplayAlerts = True
    excel.Quit()

    return df

df = open_to_read(filename, password)

The problem is it uses the whole sheet and the dataframe puts the column headers in the first row. The dataframe ends up with column headers as 0, 1, 2, etc. instead of the real headers.

I've tried looking for win32com documentation but can't find an answer. From the pandas documentation, you can specify "columns=" but it's not clear how I can use this dynamically.

Column labels to use for resulting frame when data does not have them, defaulting to RangeIndex(0, 1, 2, …, n). If data contains column labels, will perform column selection instead.

I would have to hardcode all the headers? There must be an automated way to read the column headers from the first row, no?

CodePudding user response:

In testing, sheet.UsedRange() returned a tuple of tuples, so as mentioned in the comments, pass the first row as columns and the rest as data:

temp = sheet.UsedRange()
df = pd.DataFrame(data=temp[1:], columns=temp[0])
  •  Tags:  
  • Related