I‘d like to prepare data from a converted pdf, but im struggling with escape characters. Normal lines ending with \r\n but some lines have \r from a former column break. How can I remove \r only and concate these lines? I‘ve tried all kind of string replacements, but without success.
Current String:
21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernet\r
interface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n
Desired String:
21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernetinterface.",xxxxxx,"1 .105,00 ",0%,"1.000,4 ",\r\n
CodePudding user response:
This will remove all but the last occurrence of '\r' in any string:
cstring = '21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernet\rinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'
cstring = cstring.replace('\r', '', cstring.count('\r') - 1)
print(cstring.encode())
Output:
b'21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernetinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'
CodePudding user response:
Another solution is to use re module. In the example below, we provide a pattern to look for. The pattern is: find all \r that are not immediately followed by \n and replace with ''.
import re
raw_string = '21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernet\rinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'
desired_string = '21,xxxxxxxxxxxxxxxxxxxxxxx,"xxxxxxxxxxxxx(16xxxxxxxxx),10MBitEthernetinterface.",xxxxxx,"1.105,00 ",0%,"1.000,4 ",\r\n'
final_string = re.sub(r'\r(?!\n)', '', raw_string)
print(final_string == desired_string) # True
