Home > Blockchain >  Opening .dat file UnicodeDecodeError: invalid start byte
Opening .dat file UnicodeDecodeError: invalid start byte

Time:01-12

I am using Python to convert a .dat file (which you can find here) to csv in order for me to use it later in numpy or csv reader.

import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./i2019.dat").readlines()]

# write it as a new CSV file
with open("./i2019.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)

But this results in an error message of

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8d in position 68: invalid start byte

Any help would be appreciated!

CodePudding user response:

It seems like your dat file uses Shift JIS(Japanese) encoding. So you can pass shift_jis as the encoding argument to the open function.

datContent = [i.strip().split() for i in open("./i2019.dat", encoding='shift_jis').readlines()]
  •  Tags:  
  • Related