Home > database >  Python3: How to remove \r only and concate divided lines?
Python3: How to remove \r only and concate divided lines?

Time:01-24

I‘d like to prepare data from a converted pdf, but im struggling with escape characters. Normal lines ending with \r\n but some lines have \r from a former column break. How can I remove \r only and concate these lines? I‘ve tried all kind of string replacements, but without success.

Current String:

21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernet\r
interface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n

Desired String:

21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernetinterface.",xxxxxx,"1 .105,00 ",0%,"1.000,4 ",\r\n

CodePudding user response:

This will remove all but the last occurrence of '\r' in any string:

cstring = '21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernet\rinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'

cstring = cstring.replace('\r', '', cstring.count('\r') - 1)

print(cstring.encode())

Output:

b'21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernetinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'

CodePudding user response:

Another solution is to use re module. In the example below, we provide a pattern to look for. The pattern is: find all \r that are not immediately followed by \n and replace with ''.

import re

raw_string = '21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernet\rinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'
desired_string = '21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernetinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'

final_string = re.sub(r'\r(?!\n)', '', raw_string)

print(final_string == desired_string) # True
  •  Tags:  
  • Related