Is it possible write data to a file in an unknown encoding? I cannot decode email headers, for example message-id, because if I use handler ignore or a replace https://docs.python.org/3/library/codecs.html#error-handlers non-RFC header will be RFC-compliant and antispam don't increase spam score.
I get string from postfix in milter protocol. I cannot save this data unchanged for antispam, raise UnicodeError. Examples:
cat savefile
#!/usr/bin/python3
import sys
fh = open('test', 'w ')
fh.write(sys.argv[1])
echo žlutý | xargs ./savefile && cat test
žlutý
echo žlutý | iconv -f UTF-8 -t ISO8859-2 - | xargs ./savefile
Traceback (most recent call last):
File "/root/./savefile", line 5, in <module>
fh.write(sys.argv[1])
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcbe' in position 0: surrogates not allowed
Input may be a lot of unknown encoding. Milter application in python2 works well.
CodePudding user response:
You want to handle raw bytes then, not strings. open the output file in binary mode. Note this:
sys.argv..
Note: On Unix, command line arguments are passed by bytes from OS. Python decodes them with filesystem encoding and “surrogateescape” error handler. When you need original bytes, you can get it by
[os.fsencode(arg) for arg in sys.argv].
So:
import sys
import os
with open('test', 'wb ') as fh:
fh.write(os.fsencode(sys.argv[1]))
