ECN-4363 Unicode support

Type of Change: Enhancement

Product: ACUCOBOL-GT

Module: Runtime

New Version: 10.1.0

Machines Affected: Windows

Known Versions Affected: All

DESCRIPTION:

The extend 10.1.0 release introduces support for Unicode by introducing the ability to encode using the UTF-8 and UTF-16 character encodings. For certain aspects of extend, you can configure the encoding type, and for other aspects, a specific encoding is used as default.

The introduction of this support means that your programs have the ability to understand, process, and display any Unicode character handled by those encodings. As well as this ECN, refer to the following ECNs for details on how this support affects the different areas of the extend Interoperability Suite.

  • ECN-4411
  • ECN-4416
  • ECN-GL552

The Windows runtime terminal manager now includes built-in Unicode support - for all screen I/O, the data is dynamically converted from the system code page, using UTF-16 encoding. From the point of view of your COBOL programs, little has changed. You still use all the verbs you normally use, and the data in your program remains unchanged, with one exception, described below.

If you rely on the TRANSLATE_TO_ANSI configuration variable, this no longer works, and has been replaced by the new configuration variable COBOL_CHARACTER_SET, which takes a number of different values, but has 3 pre-defined values:

OEM
Specifies that all data in the COBOL program is encoded in the OEM (DOS) character set.
ANSI
Specifies that all data in the COBOL program is encoded in the current ANSI code page character set.
UTF-8
Specifies that all data in the COBOL program is encoded in Unicode (UTF-8 format).
Note: As this is a multi-byte character set, you may need to expand some of your data variables to account for the larger size.

Additionally, setting this variable to a numeric value means that all data in the COBOL program is encoded in the ANSI code page with that value; for example: 437 is used in the United States. Setting the variable to ANSI is the same as setting it to the value of the current code page.

You can use this variable to dynamically change the specified character set if your data uses a mixture of formats; for example, data read from an XML file might be in UTF-8 format, while the rest of the data in your program might be in ANSI format.

If you have any existing UTF-16 encoded data, you must translate it to another supported encoding before it can be handled; see ECN-4367 for a solution.