XML and character encoding

For internal representation, XML documents use the Unicode character encoding standard. Unicode represents characters as 16-bit items. For external representation, most XML documents are encoded using the standard Unicode transformation formats, UTF-8 or UTF-16. XML documents created by XML Extensions are always encoded for external presentation using the UTF-8 representation. UTF-8 is a method of encoding Unicode where most displayable characters are represented in 8-bits. Characters in the range of 0x20 to 0x7e (the normal displayable character set) are indistinguishable from standard ASCII.

The XML SET ENCODING statement allows the developer to specify the character encoding of data within a COBOL data structure. The developer may use this statement to switch between the local character encoding and UTF-8. Note that even though the XML SET ENCODING statement does not affect the character encoding of the XML document, it does affect the character encoding of the data in the COBOL program. For more information, see Data Representation.