Pandas read_csv fails to separate tab-delimited data-CodePudding

I have some input files that look something like this:

GENE       CHR      START       STOP  NSNPS  NPARAM      N        ZSTAT            P
2541473       1    1109286    1133315      2       1  15000       3.8023   7.1694e-05
512150        1    1152288    1167447      1       1  15000       3.2101   0.00066347
3588581       1    1177826    1182102      1       1  15000       3.2727   0.00053256

I am importing the file like this:

df = pd.read_csv('myfile.out', sep='\t')

But all the data gets read into a single column. I have tried changing the file format to encoding='utf-8', encoding='utf-16-le', encoding='utf-16-be' but this does not work. Separating by sep=' ' will separate the data into too many columns, but it will separate. Is there a way to correctly read in this data?

CodePudding user response：

Try using \s (which reads as "one or more whitespace characters") as your delimiter:

df = pd.read_csv('myfile.out', sep='\s ')