ECN-4416 Unicode character translation library routines

Type of Change: Enhancement

Product: ACUCOBOL-GT

Module: runtime

New Version: 10.1.0

Machines Affected: All

DESCRIPTION:

There are six new library routines that can translate data items from one encoding to another; those supported encodings are UTF-8, UTF-16, ANSI, and ISO-8859-1.

C$UTF16-UTF8 translates from UTF-16 to UTF-8
C$UTF8-UTF16 translates from UTF-8 to UTF-16
C$COBOL-UTF8 translates from the current ANSI code page defined by the COBOL_CHARACTER_SET variable to UTF-8
C$UTF8-COBOL translates from UTF-8 to the current ANSI code page
C$88591-UTF8 translates from ISO-8859-1 to UTF-8
C$UTF8-88591 translates from UTF-8 to ISO-8859-1

All of these routines are called in exactly the same way:

CALL translation-routine USING source, sourcelen
                               [,destination [,destinationlen]]

where:

translation-routine: One of the C$ routines listed above.
source: The string that you are translating. It must be either a POINTER (that you must set to a valid value) or an alphanumeric item.
sourcelen: The number of characters you want to translate. If this value is 0, then the size of the source data item is used (not valid when source is a POINTER). If this value is -1, the source is assumed to be terminated by a low-value character, and the entire string is translated.
destination: If given, is where the translated characters will be moved. If not given, the return-code will be the number of characters needed in the destination item to hold the entire source string. This data item can be either a POINTER or an alphanumeric data item. If it is a POINTER, you must set it to a valid value.
destinationlen: The number of characters that can be held in the destination data item. If this parameter is -1, or is not specified, then the length of the destination is used.

The return value is the number of characters moved to the destination data item, or the number that would be needed (when the destination item is missing or NULL).

If fewer characters are placed in the destination than there is room for, the routine will pad the destination with spaces.

Examples

Using the following data definitions:

01 my-string-1 pic x(100).
01 my-string-2 pic x(100).
01 my-pointer pointer.
01 my-len signed-int.
01 alloc-len signed-int.

In the following example, since sourcelen is 0, the CALL translates all 5 characters (ABcde) into UTF-8, placing the result into my-string-1.

CALL C$88591-UTF8 using "ABcde", 0, my-string-1.

In the following example, the CALL translates the 5 characters in my-string-1 (ABcde) into UTF-8, and places the result in my-string-2, which is then padded with UTF-8 spaces. The return-code is 5 (characters).

MOVE "ABcde" to my-string-1
CALL "C$COBOL-UTF8 using my-string-1, 5, my-string-2, 50

In the following example, although the source (my-string-1) is 26 characters, the CALL translates only the first 10 characters into UTF-16. The amount of space required for translation is calculated, allocated, and then translated into that buffer, which is then freed upon completion of the translation.

MOVE "abcdefghijklmnopqrstuvwxyz" to my-string-1.
CALL "C$UTF8-UTF16" using my-string-1, 10.

MOVE return-code to my-len.
MULTIPLY my-len by 2 GIVING alloc-len.  *> UTF-16 uses 2 bytes per character

CALL "M$ALLOC" using alloc-len my-pointer.
CALL "C$UTF8-UTF16" using my-string-1, 10, my-pointer, my-len.
CALL "M$FREE" using my-pointer.