I have one person in a dataframe that keeps showing up as \ufeff in my dataframe when I print to co-CodePudding

I have python code that loads a group of exam results. Each exam is saved in it's own csv file.

files = glob.glob('Exam *.csv')
frame = []
files1 = glob.glob('Exam 1*.csv')
for file in files:
    frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))
for file in files1:
    frame.append(pd.read_csv(file, index_col=[0], encoding='utf-8-sig'))

There is one person in the whole dataframe in their name column it shows up as

\ufeffStudents Name

It happens for every single exam. I tried using the encoding argument but that's not fixing the issue. I am out of ideas. Anyone else have anything?

CodePudding user response：

That character is the BOM or "Byte Order Mark."

There are serveral ways to resovle it.

First, I want to suggest to add engine parameter (for example, engine='python' in pd.read_csv() when reading csv files.

pd.read_csv(file, index_col=[0], engine='python', encoding='utf-8-sig')

Secondly, you can simply remove it by replacing with empty string ('').

df['student_name'] = df['student_name'].apply(lambda x: x.replace("\ufeff", ""))