Home > Enterprise >  Outputting DataFrame to tsv, how to ignore or override 'need to escape' error
Outputting DataFrame to tsv, how to ignore or override 'need to escape' error

Time:01-18

Related to, but distinct from, this question.

I want to output my pandas dataframe to a tsv file. The first column of my data is a pattern that actually contains 3 bits of information which I'd like to separate into their own columns:

Range                 c1
chr1:2953-2965       -0.001069
chr1:35397-35409     -0.001050
chr1:37454-37466     -0.001330
chr2:37997-38009     -0.001235
chrX:44465-44477     -0.001292

So I do this:

Df = Df.reset_index()
Df["Range"] = Df["Range"].str.replace( ":", "\t" ).str.replace( "-", "\t" )
Df
    Range   c1
0   chr1\t2953\t2965    -0.001069
1   chr1\t35397\t35409  -0.001050
2   chr1\t37454\t37466  -0.001330
3   chr2\t37997\t38009  -0.001235
4   chrX\t44465\t44477  -0.001292

All I need to do now is output with no header or index, and add one more '\t' to separate the last column and I'll have my 4-column output file as desired. Unfortunately...

Df.to_csv( "~/testout.bed", 
                  header=None,
                  index=False,
                  sep="\t",
                  quoting=csv.QUOTE_NONE,
                  quotechar=""
               )
Error: need to escape, but no escapechar set

Here is where I want to ignore this error and say "No, python, actually you Don't need to escape anything. I put those tab characters in there specifically to create column separators."

I get why this error occurs. Python thinks I forgot about those tabs, and this is a safety catch, but actually I didn't forget about anything and I know what I'm doing. I know that the tab characters in my data will be indistinguishable from column-separators, and that's exactly what I want. I put them there specifically for this reason. Surely there must be some way to override this, no? Is there any way to ignore the error and force the output?

CodePudding user response:

You can simply use str.split to split the Range column directly -

df['Range'].str.split(r":|-", expand=True)
#     0      1      2
#0  chr1   2953   2965
#1  chr1  35397  35409
#2  chr1  37454  37466
#3  chr2  37997  38009
#4  chrX  44465  44477

To retain all the columns, you can simply join this split with the original

df = df.join(df['Range'].str.split(r":|-", expand=True))
  •  Tags:  
  • Related