How to generate XML, UTF-8 with BOM using Python Element Tree?-CodePudding

For generating resource XML file for ASP.NET, the third-party tool requires BOM (when migrating to a new version of the tool). At the same time, it requires the XML prolog like <?xml version='1.0' encoding='utf-8'?>.

The problem is that when using the ElementTree command...

tree.write(lang_resx_fpath, encoding='utf-8')

the resulting file does not contain BOM. When using the command...

tree.write(lang_resx_fpath, encoding='utf-8-sig')

the result does contain BOM; however, the XML prolog contains encoding='utf-8-sig'.

How should I generate the file to contain both BOM and encoding='utf-8'?

UPDATE:

I have worked around it by reading, replacing, and writing the file again, like this...

with open(lang_resx_fpath, 'r', encoding='utf-8-sig') as f:
    content = f.read()

content = content.replace("encoding='utf-8-sig'", "encoding='utf-8'")

with open(lang_resx_fpath, 'w', encoding='utf-8-sig') as f:
    f.write(content)

Anyway, is there any cleaner solution?

CodePudding user response：

Peek into sources of ElementTree.write shows that prolog is hardcoded there (https://github.com/python/cpython/blob/main/Lib/xml/etree/ElementTree.py or permalink https://github.com/python/cpython/blob/ee0ac328d38a86f7907598c94cb88a97635b32f8/Lib/xml/etree/ElementTree.py). Therefore probably using internals of ET is the only option (other than monkey-pathing module), to write required preamble and keep BOM in the file:

import xml.etree.ElementTree as ET
qnames, namespaces = ET._namespaces(tree._root, None)
with open(lang_resx_fpath,'w',encoding='utf-8-sig') as f:
    f.write("<?xml version='1.0' encoding='utf-8'?>\n"     )
    ET._serialize_xml(f.write,
                        tree._root, qnames, namespaces,
                       short_empty_elements=False)

Probably it is not more elegant than your solution (and maybe it is even less elegant). The only advantage is that it does not require writing file twice, which would be minor benefit besides some huge XML files.