Data representation

COBOL numeric data items are represented in XML as numeric strings. A leading minus sign is added for negative values. Leading zeros (those appearing to the left of the decimal point) are removed. Trailing zeros (those appearing to the right of the decimal point) are likewise removed. If the value is an integer, no decimal point is present.

COBOL nonnumeric data items are represented as text strings and have trailing spaces removed (or leading spaces, if the item is described with the JUSTIFIED phrase). Note, however, that in edited data items, leading and trailing spaces are preserved. In addition, any embedded XML special characters are represented by escape sequences; the ampersand (&), less than (<), greater than (>), quote ("), and apostrophe (') characters are examples of such XML special characters.

Note: For more information, see Handling Spaces and Whitespace in XML.

On Windows platforms, nonnumeric displayable data are normally encoded using Microsoft's OEM or ANSI data format. On output, these data are converted to the standard Unicode 8-bit transformation format, UTF-8. On input, data is converted to the OEM or ANSI data format. If the XML SET ENCODING statement is used to specify "UTF-8", then the internal data format is UTF-8. For more information, see the discussion of Windows Character Encoding.

On UNIX platforms, nonnumeric displayable data are normally encoded using a "local" character encoding that the UNIX system uses. Typically, this may be Latin-1 or Latin-9. On output, these data are converted to the standard Unicode 8-bit transformation format, UTF-8. On input, data is converted to the systems internal format. If the XML SET ENCODING statement is used to specify "UTF-8", then the internal data format is UTF-8. For more information on selecting an appropriate "local" character encoding, refer to the discussion of UNIX Character Encoding.