I have python code that loads a group of exam results. Each exam is saved in it's own csv file.
files = glob.glob('Exam *.csv')
frame = []
files1 = glob.glob('Exam 1*.csv')
for file in files:
frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
for file in files1:
frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
There is one person in the whole dataframe in their name column it shows up as
\ufeffStudents Name
It happens for every single exam. I tried using the encoding argument but that's not fixing the issue. I am out of ideas. Anyone else have anything?
CodePudding user response:
That character is the BOM or "Byte Order Mark."
There are serveral ways to resovle it.
First, I want to suggest to add engine parameter (for example, engine='python' in pd.read_csv() when reading csv files.
pd.read_csv(file, index_col=[0], engine='python', encoding='utf-8-sig')
Secondly, you can simply remove it by replacing with empty string ('').
df['student_name'] = df['student_name'].apply(lambda x: x.replace("\ufeff", ""))
