ECN-4416 Unicode character translation library routines

Type of Change: Enhancement

Product: ACUCOBOL-GT

Module: runtime

New Version: 10.1.0

Machines Affected: All

DESCRIPTION:

There are six new library routines that can translate data items from one encoding to another; those supported encodings are UTF-8, UTF-16, ANSI, and ISO-8859-1.

All of these routines are called in exactly the same way:

CALL translation-routine USING source, sourcelen
                               [,destination [,destinationlen]]

where:

translation-routine
One of the C$ routines listed above.
source
The string that you are translating. It must be either a POINTER (that you must set to a valid value) or an alphanumeric item.
sourcelen
The number of characters you want to translate. If this value is 0, then the size of the source data item is used (not valid when source is a POINTER). If this value is -1, the source is assumed to be terminated by a low-value character, and the entire string is translated.
destination
If given, is where the translated characters will be moved. If not given, the return-code will be the number of characters needed in the destination item to hold the entire source string. This data item can be either a POINTER or an alphanumeric data item. If it is a POINTER, you must set it to a valid value.
destinationlen
The number of characters that can be held in the destination data item. If this parameter is -1, or is not specified, then the length of the destination is used.

The return value is the number of characters moved to the destination data item, or the number that would be needed (when the destination item is missing or NULL).

If fewer characters are placed in the destination than there is room for, the routine will pad the destination with spaces.

Examples

Using the following data definitions:

01 my-string-1 pic x(100).
01 my-string-2 pic x(100).
01 my-pointer pointer.
01 my-len signed-int.
01 alloc-len signed-int. 

In the following example, since sourcelen is 0, the CALL translates all 5 characters (ABcde) into UTF-8, placing the result into my-string-1.

CALL C$88591-UTF8 using "ABcde", 0, my-string-1.

In the following example, the CALL translates the 5 characters in my-string-1 (ABcde) into UTF-8, and places the result in my-string-2, which is then padded with UTF-8 spaces. The return-code is 5 (characters).

MOVE "ABcde" to my-string-1
CALL "C$COBOL-UTF8 using my-string-1, 5, my-string-2, 50

In the following example, although the source (my-string-1) is 26 characters, the CALL translates only the first 10 characters into UTF-16. The amount of space required for translation is calculated, allocated, and then translated into that buffer, which is then freed upon completion of the translation.

MOVE "abcdefghijklmnopqrstuvwxyz" to my-string-1.
CALL "C$UTF8-UTF16" using my-string-1, 10.

MOVE return-code to my-len.
MULTIPLY my-len by 2 GIVING alloc-len.  *> UTF-16 uses 2 bytes per character

CALL "M$ALLOC" using alloc-len my-pointer.
CALL "C$UTF8-UTF16" using my-string-1, 10, my-pointer, my-len.
CALL "M$FREE" using my-pointer.