I have some input files that look something like this:
GENE CHR START STOP NSNPS NPARAM N ZSTAT P
2541473 1 1109286 1133315 2 1 15000 3.8023 7.1694e-05
512150 1 1152288 1167447 1 1 15000 3.2101 0.00066347
3588581 1 1177826 1182102 1 1 15000 3.2727 0.00053256
I am importing the file like this:
df = pd.read_csv('myfile.out', sep='\t')
But all the data gets read into a single column. I have tried changing the file format to encoding='utf-8', encoding='utf-16-le', encoding='utf-16-be' but this does not work. Separating by sep=' ' will separate the data into too many columns, but it will separate. Is there a way to correctly read in this data?
CodePudding user response:
Try using \s (which reads as "one or more whitespace characters") as your delimiter:
df = pd.read_csv('myfile.out', sep='\s ')
