Migrating Applications

National Language Support (NLS) Demonstration

Chapter 7: Internationalization Support

This chapter describes the internationalization support that Object COBOL provides.

7.1 Overview

Object COBOL offers several levels of internationalization support:

16-bit and UNIX:
National Language Support (NLS) for single-byte character sets. This is available only on the 16-bit COBOL system and most UNIX COBOL systems.
Double-Byte Character Set (DBCS) support
Support for translation between ASCII and EBCIDIC

7.1.1 National Language Support (NLS)

NLS provides a means of adapting your program to the character code set, collating sequence, and editing symbols associated with a particular country, without your application knowing these in advance. This facility is also useful in English-speaking countries for handling codepages, collating upper-case and lower-case letters correctly and selecting the appropriate currency symbol.

Examples of language and character sets that NLS accommodates are:

ISO646 National Variants:	French, German, Swedish, Danish, Italian, Spanish, Portuguese
ISO8859/1:	West European languages
IBM Personal Computer 8-bit Character Set:	West European languages

When you run a program compiled with the NLS directive, it behaves according to the language, country and character set specified in the language environment variable.

7.1.2 Double-Byte Character Set (DBCS) Support

DBCS support enables you to create native language applications that use double-byte character strings, for countries such as Japan, China and Korea.

Object COBOL includes some support for handling double-byte characters. A fully DBCS-enabled and localized version of Object COBOL is available in Japan.

You can create DBCS applications using existing tools and syntax, such as ACCEPT/DISPLAY with PIC X. However, Object COBOL also supports additional COBOL syntax used in DBCS environments. This syntax is described in your Language Reference - Additional Topics. To enable this support, use the DBCS, NCHAR or JAPANESE compiler directive.

The library routine CBL_GET_OS_INFO enables your application to detect the character encoding for Far Eastern countries. See the chapter Library Routines.

7.1.2.1 DBCS Transparency Support

DBCS transparency support is available in Object COBOL on many platforms.

DBCS transparency support enables users of your applications to enter, store, manipulate, display and print single-byte and double-byte character strings in their native language. In addition, you can use tools such as Animator, Editor and Screens to create and maintain application programs that support your native language.

Editing operations such as backspace, delete and insert, as well as cursor movement, are by byte and not by character. For example, a delete operation needs two keystrokes to delete both bytes of a DBCS character. After the deletion of the first byte (half a DBCS character) the remaining characters may appear corrupted, but if you press the delete key a second time the other byte of the character is deleted with no corruption at all.

However, with some tools and on some terminal environments, editing operations and cursor movement are by DBCS character and not by single bytes.

DBCS transparency support is restricted in the following tools, which are described in your Object COBOL Character Tools.

Data File Editor - This component is not DBCS transparent. When editing a file you must enter DBCS characters in hexadecimal mode.
Editor - You cannot split DBCS characters across margin boundaries. Also wrapping of paragraphs with DBCS is undefined. See your Object COBOL Character Tools for details.
Hexedit - Within the EBCDIC option only single byte characters will be displayed. To display DBCS characters select the All option using the F6 display toggle.

7.1.2.2 Double-Byte Line Draw and Graphics Characters

The routines CBL_GET_SCR_LINE_DRAW and CBL_GET_SCR_GRAPHICS support the line draw and graphics characters used in all the environments supported by Object COBOL. See the chapter Generic Line Drawing in your Programmer's Guide to User Interfaces.

Some demonstration programs supplied with this system contain IBM PC encoded line draw characters. Under some environments, you need to amend these characters in the COBOL source before using these programs.

7.1.2.3 Additional Syntax to Support DBCS

Object COBOL supports the following items that you can use in DBCS applications in addition to the standard PIC X support:

PIC G and PIC N double-byte fields - For full details see the chapter Double-Byte Character Set Support in your Language Reference - Additional Topics.
DBCS directives - The Compiler directives DBCS, DBCSSOSI, DBCHECK and DBSPACE control the behavior of the PIC G and PIC N functionality. For further details refer to the chapter Directives for Compiler in your Object COBOL User Guide.
Japanese support - Additional syntax is available for Japanese environments and is described in the chapter Micro Focus Extensions for Double-Byte Character Support in your Language Reference - Additional Topics.

Note: When performing a Format 4 or Format 5 ACCEPT statement into an identifier of class DBCS, you must enter double-byte data only. See the chapter Double-Byte Character Set Support in your Language Reference - Additional Topics.

7.1.3 Support for Translation between ASCII and EBCDIC

Object COBOL provides the CHARSET compiler directive and the _CODESET program for converting between ASCII and EBCDIC. It also provides some preconfigured _CODESET programs supporting ASCII and EBCDIC codepages for several Far Eastern languages and German.

7.2 Operation

To use NLS or DBCS transparency support:

Your operating system must support the required character sets.
Your terminals must support the required character sets.
You need to set up the environment variable for the required language, country and character set.
If your program uses full screen ACCEPT operations, you might need to set the range of data keys to be accepted in the adisctrl database. See the chapter Adis Configuration Utility (Adiscf) in your Programmer's Guide to Creating User Interfaces for details.

UNIX:
Object COBOL relies on UNIX to provide the appropriate national language tables for the language, country and character set that you specified. These are known as locales.

7.3 Setting Up the Environment for NLS and DBCS

Before you run a program that uses NLS or that uses DBCS support on UNIX, you must define your locale in the language environment variable. The locale is the combination of language, territory and codepage.

The locale may be already defined for you. If it is not, you define it as follows:

16-bit :
To define the locale on 16-bit systems, set the COBLANG environment variable. It is possible to use the LANG environment variable instead of COBLANG, but this is not recommended since it clashes with other third party software. Set COBLANG as follows:

set coblang=language[_territory[.codepage]]

UNIX:
To define the locale on UNIX, enter:

LANG=language[_territory[.codepage]] 
export LANG

where the parameters are:

language

The language in which to run your program. This parameter specifies the message catalog to use, so that any error messages are output in the required language.

DOS, Windows and OS/2:
On DOS, Windows and OS/2 language is a numeric value, such as:

1 = American English
2 = Canadian French
3 = Danish
4 = Dutch
5 = English
6 = Finnish
7 = French
8 = German
9 = Italian
10 = Norwegian
11 = Portuguese
12 = Spanish
13 = Swedish
20 = Japanese

UNIX:
On UNIX, language is a name for the language (such as defined in ISO 639). Refer to your UNIX documentation for valid values. The following are some example values:

en	= English
fr	= French
ko	= Korean
zh	= Chinese

_ (underscore) is the delimiter for language and territory, if territory is specified.

territory

This relates to the country in which you run your program. This parameter defines the currency symbol, comma and decimal point, as well as the collating sequence.

DOS, Windows and OS/2 :
On DOS, Windows and OS/2, if you omit this parameter, the country specified in your config.sys file is used. territory is a numeric value, such as

44	= Great Britain
41	= Switzerland

UNIX :
On UNIX, if you omit this parameter, a default country is derived from language. territory is a name, such as the following from ISO3166:

GB	= Great Britain
FR	= France
TW	= Traditional Chinese
CN	= Simplified Chinese.

. (period) The delimiter for territory and codepage, if codepage is specified.

codepage

The character set to use for your program.

DOS, Windows and OS/2 :
On DOS, Windows and OS/2, if you omit this parameter, the codepage specified in your config.sys file is used. If you specify a codepage, you must specify the same codepage as in your config.sys file. codepage is a numeric value and can take values such as the following, providing they are supported by your operating system:

437 = Default US codepage
850 = International
852 = East European
861 = Icelandic
865 = Nordic
932 = Japan

UNIX:
On UNIX, if you omit this parameter, a default character set is derived from language and territory.

Refer to your operating system documentation for valid territory and codepage values.

Examples

DOS, Windows and OS/2 :
The following example on DOS, Windows and OS/2, specifies the language as French and uses the operating system values for territory and codepage:

set coblang=7

DOS, Windows and OS/2 :
The following example on DOS, Windows and OS/2, specifies the language as English, the territory as Britain and the codepage as 437:

set coblang=5_44.437

DOS, Windows and OS/2 :
The following example on DOS, Windows and OS/2 specifies the language as German, the territory as Switzerland, and uses the operating system values for codepage:

set coblang=8.41

SCO UNIX and AIX:
The following example on SCO UNIX or AIX specifies the the language as French, the territory as France, and uses the the ISO 8859-1 codeset:

LANG=fr_FR
export LANG

7.4 Compiling Programs with NLS

To use the NLS facility in your program, you must compile it with the NLS directive set. By default, the NLS directive is not set.

You must not use the following syntax in a program compiled with the NLS directive set:

In the Object-Computer paragraph, the phrase
PROGRAM COLLATING SEQUENCE IS alphabet-name
In the Special-Names paragraph, the phrases:
alphabet-name IS
STANDARD-1
NATIONAL
literal-1 THRU literal-2
CURRENCY SIGN IS literal
DECIMAL-POINT IS COMMA
In the Procedure Division, in MERGE or SORT statements, the phrase
COLLATING SEQUENCE IS alphabet-name.

You can compile any program in your application with the NLS directive set. Thus some programs within a suite may use NLS facilities, others may not. See the section Mixing Programs with and without NLS later in this chapter for details.

DOS, Windows and OS/2:
To produce NLS programs as executable programs on DOS, Windows and OS/2, you must link them with the cobnlsmg.obj file as described in your Object COBOL User Guide.

7.5 Running Your NLS Program

You run programs with the NLS facility in the same way as you run programs without it. See your Object COBOL User Guide for full details.

The run-time system initializes the NLS facility only once during an application's run; when it encounters the first program that was compiled with the NLS directive. It uses the LANG (or COBLANG on 16-bit systems) environment variable to determine the language environment to set up for this program. The run-time system uses the same language environment for any subsequent programs that compiled for NLS. See the section Mixing Programs with and without NLS later in this chapter for details.

If an error occurs during initialization, for example the language specified in the LANG (or COBLANG on 16-bit systems) environment variable is not supported, the run-time system issues an error message and terminates its run. Full details on NLS error messages are in your Error Messages.

UNIX:
You must not use indexed files with variable length records when the NLS directive is set.

UNIX:
You cannot use indexed files created with one collating sequence in a file language environment that uses a different collating sequence. If you do and you specified file status bytes in your program, the COBOL system returns the file status 9/45. If you did not specify file status bytes in your program, the run-time system issues an error message.

When you run a program compiled with the NLS directive set, the language specified in the LANG (or COBLANG on 16-bit systems) environment variable defines the behavior of:

String comparisons, including comparisons of alphanumeric or group items
Class condition tests (ALPHABETIC, ALPHABETIC-UPPER, ALPHABETIC-LOWER, NUMERIC and user-defined CLASS conditions), appropriate to national and multinational character sets
UNIX:
Key comparisons in indexed sequential files
Comparisons performed as part of a SORT or MERGE statement
Case conversion
Collating sequences operations, for example, for the correct handling of accented characters in West European languages
Editing and de-editing moves
The intrinsic functions Numval and Numval-c

Other language-dependent features, such as the symbols used to denote the decimal points and the currency character , also appear in the format of the specified language. However, in languages that define the currency sign as trailing, this has no effect and the usual COBOL rules are observed.

Certain NLS definitions have characters other than the ASCII characters 0-9 defined as numerics. Such characters cannot be ACCEPTed into numeric picture strings, nor can they be used in numeric operations. Note that in all NLS operations in the COBOL environment, a numeric item must be formed only from the ASCII digits 0-9, with or without the ASCII operational signs "+" or "-". There is no means of automatically converting the NLS representation to the ASCII equivalent.

It is possible to enter European modifying characters into numeric ACCEPT fields. These are accepted as zero.

Note that the values assigned to figurative constants, for example LOW-VALUES, are not changed by using NLS features.

Unpredictable results occur in the following circumstances when you use the NLS facility:

An item in a conditional expression contains the value LOW-VALUE.
An indexed file key contains the value LOW-VALUE.
An elementary item of a group that is part of a conditional expression is not USAGE DISPLAY.
An elementary item of an indexed file key is not USAGE DISPLAY.

You can also use the Adis Flip Case Control key when using NLS characters. However, if you attempt to convert a European character to upper case using this key, the character will be replaced by spaces if it has no upper case equivalent.

7.5.1 String Comparisons

For programs compiled with the NLS directive set, the run-time system invokes the CBL_NLS_COMPARE routine to compare strings. This routine is also called for alphanumeric comparisons if the program uses the intrinsic functions Max, Min, Ord-max or Ord-min on alphanumeric operands.

During a MOVE operation of one alphanumeric item to another which is longer, padding with spaces occurs. Similar padding is also implied before the comparison of two such items. In both cases, an ASCII space is assumed.

7.5.2 Class Condition Tests

For programs compiled with the NLS directive set, the run-time system invokes locale-specific tests when it needs to carry out class condition tests. These tests determine if a string of information is in ALPHABETIC, ALPHABETIC-UPPER, ALPHABETIC-LOWER or NUMERIC format. The numeric test always tests that all characters are in the range of ASCII 0-9.

7.5.3 Key Comparisons in Indexed Sequential Files

UNIX:
On UNIX, the run-time system uses the CBL_NLS_COMPARE routine for key comparisons associated with NLS files.

If the logical filename of an indexed sequential file is preceded by the five characters "%NLS%", the file is treated as being keyed according to the collating sequence specified in the LANG environment variable. This applies whether "%NLS%" precedes the logical filename before the filename is resolved using environment variables or after it is resolved. This also applies only if the program that OPENs the file was compiled with the NLS directive set.

If the logical filename is preceded by "%NLS%" after it is resolved using environment variables, but the program that OPENs the file was compiled without the NLS directive set, the OPEN fails and returns a run-time system error.

If your file contains variable length records, the NLS collating sequence is not used.

For details on assigning logical filenames to be resovled using environament variables, see the File Naming chapter in the Programmer's Guide to File Handling.

7.5.4 SORT and MERGE Comparisons

If a program compiled with the NLS directive set performs a SORT or MERGE operation, it automatically uses NLS key comparisons.

UNIX:
On UNIX, the logical filename of the work file must be preceded by the characters "%NLS%" in the ASSIGN statement, and the program that OPENs it must be compiled with the NLS directive set. This is as described in the previous section for indexed sequential files.

The run-time system invokes the CBL_NLS_COMPARE routine for all key comparisons used in SORT or MERGE operations.

7.5.5 Case Conversion

If a program compiled with the NLS directive set performs a case conversion, either by using the intrinsic functions Upper-case and Lower-case or by calling the library routines CBL_TOUPPER and CBL_TOLOWER, the run-time system invokes a routine to fold national characters correctly.

7.5.6 Collating Sequence Operations

When a program is compiled with the NLS directive, a collating sequence appropriate to the locale is used. If the program performs a collating sequence operation by using the intrinsic functions Char and Ord, then this special collating sequence is used.

7.5.7 Editing and De-editing Moves

If a program compiled with the NLS directive set performs an editing or de-editing move, then the decimal point and thousands separator appropriate to the national language locale are used. Also, the currency symbol used is the first character of the currency symbol of the locale territory.

7.5.8 Intrinsic Functions Numval and Numval-c

If a program compiled with the NLS directive set attempts to convert a number or a monetary value in a display item to a numeric value by using the intrinsic functions Numval or Numval-c, then the run-time system uses the decimal point and thousands separator appropriate to the locale. Also, if no second argument is supplied for Numval-c, then the currency symbol used is the currency symbol of the locale territory.

7.5.9 User Interfaces

If you require information concerning the language environment you are using, you can access the NLS routines supported by your COBOL system. Full details on all of these routines can be found later in this chapter.

UNIX:
You also can access the UNIX operating system routines, such as nl_langinfo(), though these routines are not portable to other environments. Full details on these routines can be found in your operating system documentation.

7.6 Mixing Programs with and without NLS

If you have a suite of programs, you can compile individual COBOL programs with the NLS directive set. A program compiled without the NLS directive set can CALL both programs which were compiled without this directive set and programs which were compiled with it set. The reverse is equally true; that is, a program compiled with the NLS directive set can CALL both programs compiled with this directive, and those compiled without it.

A program compiled with the NLS directive has no particular language or locale associated with it. It uses the application locale, which is initialized along with the NLS facility when the first NLS program in the application is called.

Once the application locale is set, it is used for all subsequently called programs compiled with the NLS directive. You cannot change the language environment after this, even if you change the setting of the LANG (or COBLANG on 16-bit systems) environment variable. You must therefore ensure that all programs that are compiled with the NLS directive use the same language.

The library routines for NLS can be called from any program compiled with the NLS directive. Programs in the application that were compiled without NLS can also call these routines, but only after the National Language Support module has been loaded and initialized by calling a program compiled with NLS.

You can pass parameters from a program with the NLS facility to a program without it. However, note that parameters that depend on the language environment in which they are created, retain their format regardless of the language environment in which they are used. If you attempt to use a parameter created in a program with the NLS facility, in a program without the NLS facility, the result might not be as you expect. We recommend that you do not attempt to pass such parameters to programs other than those that have the same language environment.

7.7 Writing NLS Message Files

16-bit:
On the 16-bit system, you can output messages in the user's language. You provide the messages in an ASCII text file and access the message file using the NLS library routines, such as CBL_NLS_OPEN_MSG_FILE.

The format of the message file specifies that each line in the file can contain one of the following:

`$ comment`	A line beginning with a dollar followed by a space (or tab) is treated as a comment
`$set n`	Specifies a set of messages, and assigns a set number n. Subsequent lines containing messages belong to this set, until the next line starting with $set. Set numbers must be in ascending order, but need not be contiguous. If you do not specify a $set, a default set is used. This default set is implementation defined and is not guaranteed to be 1 (or any other number).
`$quote c`	Specifies an optional quotation character, c, which you can use to surround the message text so that trailing spaces or null messages are visible. By default, or if the $quote directive is empty, any quotes are ignored.
`m message-text`	The message-text is stored in the message catalog with the set number specified in the last $set statement, and with message number m. Message numbers must be in ascending order within a single set, but need not be contiguous.
Blank	Blank lines are ignored.

Example message file:

$quote " 
 
$ 
$set 1 
1    "Too small a starting balance to open this type of account"  
2    "This debit exceeds overdraft limit"  
3    "This debit overdraws account"  
$ ***********************************************************

7.8 NLS Library Routines

16-bit:
In the 16-bit COBOL system, the following library routines are available to help you write programs supporting NLS:

CBL_NLS_CLOSE_MSG_FILE	Close NLS message file
CBL_NLS_COMPARE	Compare two strings
CBL_NLS_INFO	Get/set national information
CBL_NLS_OPEN_MSG_FILE	Open NLS message file
CBL_NLS_READ_MSG	Read message from message file

These routines can be called only from a program compiled with the NLS directive.

CBL_NLS_CLOSE_MSG_FILE

Closes a National Language Support (NLS) message file.

Syntax:

call "CBL_NLS_CLOSE_MSG_FILE" using   msg-file-handle 
                              returning status-code

Parameters:

`msg-file-handle`	pic x(4).
`status-code`	See Key in the Preface

On Entry:

msg-file-handle The identifying handle returned when the message file was opened.

On Exit:

status-code

Indicates whether the routine was successful:

0	Success
40	NLS module not initialized
404	Invalid msg-file-handle

If status-code contains a value other than these, it is the number of a run-time error message.

Comments:

This routine is available in the 16-bit COBOL system only.

This routine enables you to close a National Language Support (NLS) message file that had been previously opened using the CBL_NLS_OPEN_MSG_FILE routine.

This routine can be used only from a program compiled with the NLS directive.

CBL_NLS_COMPARE

Compares two strings.

Syntax:

call "CBL_NLS_COMPARE" using        string1 
                                    string2 
                       by value     string1-length 
                       by value     string2-length 
                       by reference result-byte 
                       returning    status-code

Parameters:

`string1`	pic x(n).
`string2`	pic x(n).
`string1-length`	pic x(4) comp-5.
`string2-length`	pic x(4) comp-5.
`result-byte`	pic s9 comp-5.
`status-code`	See Key in the Preface

On Entry:

`string1`	The first string.
`string2`	The second string.
`string1-length`	Length of the first string.
`string2-length`	Length of the second string.

On Exit:

result-byte

Result of the comparison:

0	The two strings are the same
-1	string1 < string2
+1	string1 > string2

status-code

Indicates whether the routine was successful:

0	Success
40	NLS module not initialized

Comments:

This routine is available in the 16-bit COBOL system only.

This routine can be used only from a program compiled with the NLS directive.

CBL_NLS_INFO

Get/set national language information.

Syntax:

call "CBL_NLS_INFO" using     function-code 
                              info-category 
                              info-buffer 
                    returning status-code

Parameters:

`function-code`	pic x comp-x.
`info-category`	pic x comp-x.
`info-buffer`	pic x(n).
`status-code`	See Key in the Preface

On Entry:

function-code

1	Get national language information
2	Set national language information

info-category

Category of information to get or set:

1	Currency symbol
2	Thousands separator
3	Decimal separator

info-buffer Information to set (when function-code is two). This is null terminated. The thousands and decimal separators are each one character long. The currency symbol is up to 10 characters long.

On Exit:

info-buffer The information requested (when function-code is one)

status-code

Indicates whether the routine was successful:

0	Success
40	NLS module not initialized
405	Failure (when function-code =2)

Any other value is the number of a run-time error message

Comments:

This routine is available in the 16-bit COBOL system only.

This routine enables you to both get and set information about the national language. With function-code is two (set NLS information) the change made applies to only the program that made the call.

This routine can be used only from a program compiled with the NLS directive.

CBL_NLS_OPEN_MSG_FILE

Open a National Language Support (NLS) message file.

Syntax:

call "CBL_NLS_OPEN_MSG_FILE" using    msg-filename 
                                       msg-filename-ln 
                                       msg-file-handle 
                             returning status-code

Parameters:

`msg-filename`	pic x(n).
`msg-filename-ln`	pic x comp-x.
`msg-file-handle`	pic x(4).
`status-code`	See Key in the Preface

On Entry:

`msg-filename`	The name of the message file to be opened.
`msg-filename-ln`	The length of msg-filename. If this parameter is set to zero, the default message file is opened regardless of the contents of msg-filename.

On Exit:

msg-file-handle The identifying handle.

status-code

Indicates whether the routine was successful:

0	Success
40	NLS module not initialized

If status-code contains a value other than these, it is the number of a run-time error message.

Comments:

This routine is available in the 16-bit COBOL system only.

This routine opens an NLS message file, returning an identifying handle that you can use with the CBL_NLS_READ_MSG and CBL_NLS_CLOSE_MSG_FILE routines. You can create different message files for each language you want your program to work with, using the same call to access each message in the appropriate national language. You can use a default message file, or create your own.

This routine can be used only from a program compiled with the NLS directive.

CBL_NLS_READ_MSG

Reads a message from a National Language Support (NLS) message file.

Syntax:

call "CBL_NLS_READ_MSG" using     msg-file-handle 
                                  full-msg-number 
                                  msg-ins-structure 
                                  msg-buffer 
                        returning status-code

Parameters:

`msg-file-handle`	pic x(4).
`full-msg-number`	Group item defined as:
`msg-set-number`	pic x(2) comp-x.
`msg-number`	pic x(2) comp-x.
`msg-ins-struct`	Group item defined as:
`ins-count`	pic x(2) comp-x.
`ins-pointer`	usage pointer occurs n times.
`msg-buffer`	Group item defined as:
`msg-buff-len`	pic x(2) comp-x.
`msg-buff-text`	pic x(n).
`status-code`	See Key in the Preface

On Exit:

msg-buff-text The returned text (null-terminated).

status-code

Indicates whether the routine was successful:

0	Success
40	NLS module not initialized
401	Message set not found
402	Message not found in set
403	Message too long for message text buffer
404	Invalid message file handle

If status-code contains a value other than these, it is the number of a run-time error message.

Comments:

This routine is available in the 16-bit COBOL system only.

In each message file, messages are divided into sets; this enables you to define your own message set in the default message file if you want. This routine also enables you to insert portions of text into a message fetched from the message file in the order appropriate to the rules of the grammar for the national language.

This routine can be used only from a program compiled with the NLS directive.

7.9 Converting Between ASCII and EBCDIC

Normally you use ASCII format data on the PC, although with Object COBOL you can use EBCDIC format data. Storing data in EBCDIC format on your PC enhances the testing of programs targeted to run on the mainframe and eases data exchange between the mainframe and the PC.

If you want to use EBCDIC data on the PC, make sure your programs are in ASCII format, and then compile them with the CHARSET(EBCDIC) directive. You can then use the resulting compiled programs with unconverted data files.

By default, Object COBOL supports ASCII/EBCDIC conversions of a standard US codepage based on PC codepage 437. Additional preconfigured modules are available for converting various Far Eastern languages and German. To convert EBCDIC correctly in these environments, you need to replace the default module with the preconfigured module for the relevant character set.

DOS, Windows and OS/2:
For details of converting between ASCII and EBCDIC when moving to and from the mainframe, on DOS, Windows and OS/2, see the Mainframe Programmer's Guide.

7.9.1 Converting ASCII/EBCDIC Data Using the CHARSET Directive

You can convert your data between the ASCII and EBCDIC character sets by using the CHARSET directive. Compile your program with the CHARSET directive as follows:

CHARSET "character-set"

Where character-set is ASCII or EBCDIC.

The CHARSET directive also sets the related directives NATIVE and SIGN. If you have CHARSET(EBCDIC) set, you also have the behavior of SIGN(EBCDIC) and NATIVE(EBCDIC).

Compiling with the EBCDIC option results in a Data Division that looks exactly the same as it would when compiling with a mainframe compiler running in an EBCDIC environment. This means that DISPLAY type values are in EBCDIC, and COMP values are binary fields, as normal. Similarly, any alphanumeric literal in the Procedure Division (for example, MOVE "HELLO" TO FRED; MOVE ALL "X" TO FRED) is in EBCDIC. The procedural code expects all DISPLAY type data, including numeric data, to be in EBCDIC.

When reading a file, if portions of the record represent characters while other portions are binary, it is necessary to have a suitable record definition in the COBOL program, just as it would be on the mainframe. The intention is that data accessible to the user (such as file records and Working-Storage) should always appear exactly as on the mainframe.

The only concessions to the ASCII environment of the PC when using the EBCDIC support are as follows:

Data to be displayed is moved to a work area and converted into ASCII before the DISPLAY happens.
Data to be accepted is converted to EBCDIC after the ACCEPT occurs.
Records written to text files (files with ORGANIZATION LINE SEQUENTIAL or with the LINE ADVANCING option in the ASSIGN clause) are moved to a work area and converted to ASCII before being written out.
Records read from text files are converted to EBCDIC.
Records written to files using the PRINTER option (of the External File Mapper) are converted to ASCII before being written out.
Other translations occur internally to ensure smooth running, such as the contents of the data-name in:
```
call data-name
```
or:
```
assign to data-name (a Micro Focus extension)
```

Both the called and the calling programs must be compiled with the same setting for the CHARSET directive.

7.9.2 Current Limitations on ASCII and EBCDIC

Only ANSI standard ACCEPT and DISPLAY statements are supported when the data is stored in EBCDIC format. The Micro Focus extensions to the ACCEPT and DISPLAY statements are not supported.
If a CALLed program expects parameters to be passed in ASCII format, you must ensure that the data is in the correct form.
16-bit:
On the 16-bit COBOL system, the EBCDIC feature does not currently support programs compiled to object format.
MF/370 only supports EBCDIC.
CICS OS/2 Option only supports ASCII.
OS/2 Database Manager only supports ASCII, unless you have the Host Compatibility Option from Micro Focus.

7.9.3 Collating Sequences for ASCII and EBCDIC

The collating sequences for ASCII and EBCDIC are different. If you convert from EBCDIC to ASCII, comparisons might no longer give the same results. ASCII numerics and upper and lower case orderings are different from EBCDIC. For example, the ASCII numbers (hex 30 to 39) come before uppercase letters (hex 41 to 5A), followed by lower-case letters (hex 61 to 7A).

EBCDIC lower-case letters sort first (hex 81 to 89, 91 to 99, A2 to A9), followed by uppercase letters (hex C1 to C9, D1 to D9, E2 to E9), followed by numbers (hex F0 to F9). EBCDIC has special characters throughout the range.

For example, in the following code, the results are different, depending on whether you are executing in an ASCII or EBCDIC environment.

 data division
 working-storage section.

 01 item-1    pic x value"a".
 01 item-2    pic x value"a".
      ...
 procedure division
      ...
     if item-1 < item-2
         display "ASCII"
     else
         display "EBCDIC"
     end-if

When a program collating sequence is in effect, alphanumeric comparisons (including group comparisons) that contain numeric data are likely to produce meaningless results.

For instance, let's say field-a is PIC X and contains x"4F" (the ASCII character value for the letter "O"); field-b is PIC X and contains x"5F" (the underscore character). Without an EBCDIC collating sequence, field-a is less than field-b.

With an EBCDIC collating sequence, however, the x"4F" is mapped onto the EBCDIC "O", which is x"D6". The underscore character, x"5F", is mapped onto the EBCDIC character x"6D". With NATIVE(EBCDIC) set, field-a is going to be greater than field-b.

Some of the areas affected by using different collating sequences are IF statements, SORT statements, writing data to indexed files, and any areas where you assume data to contain the EBCDIC hex values (for example, assuming space = X"40").

7.9.4 Replacing the Default ASCII/EBCDIC Converter

DOS, Windows and OS/2 :
On DOS, Windows and OS/2, there are preconfigured converters available for converting ASCII and EBCDIC data for Japanese, Traditional Chinese, Simplified Chinese, Korean, and German. To convert ASCII and EBCDIC data successfully in these environments, you need to replace the default converter, the _CODESET module, with the relevant preconfigured converter.

DOS, Windows and OS/2 :
For example on DOS, Windows and OS/2, to replace the default _CODESET module with the supplied predefined _cs937 _CODESET module for Traditional Chinese support:

Backup your original _CODESET modules as follows:

copy utils.lbr utils437.lbr
copy _codeset.obj _cs437.obj
copy codeset.dll _cs437.dll (16-bit systems)
copy cdeset32.dll _cs437.dll (32-bit Windows and
    OS/2)

Copy the _cs937 modules as follows:

copy _cs937.gnt _codeset.gnt
copy _cs937.obj _codeset.obj
copy _cs937.dll codeset.dll (16-bit systems)
copy _cs937.dll cdeset32.dll (32-bit Windows and
    OS/2)

Update the contents of utils.lbr with the replacement module:
```
run library _codeset.gnt utils.lbr = utils.lbr
```

7.9.4.1 Preconfigured ASCII/EBCDIC Converters

DOS, Windows and OS/2:
This table shows the character sets that are supported with preconfigured versions of the ASCII/EBCDIC converter, on DOS, Windows and OS/2.

The digits used in the names of the support modules represent the value of the Coded Character Set Identifier (CCSID) assigned by IBM's Character Data Representation Architecture (CDRA). For example, the CCSID of the Japanese (Katakana) character set is 9122, and the Korean character set is 933.

Character Set	PC Encoding	SBCS CS/CP	DBCS CS/CP	Modules
Traditional Chinese	BIG-5 or IBM 5550	01175/00037	00935/00835	_cs937.gnt,.obj and .dll
Simplified Chinese	GB	01174/00836	00937/00837	_cs935.gnt, .obj or .dll
Korean	KS-code	01173/00833	00934/00834	_cs933.gnt, .obj and .dll
German	ASCII	00697/00273	-	_csger.gnt, .obj and .dll
Japanese (Katakana)	Shift-JIS	00332/00290	01001/00300	_cs9122.gnt, .obj and .dll
Japanese (Katakana) Extended	Shift-JIS	01172/00290	01001/00300	_cs930.gnt, .obj and .dll
Japanese (Latin) Extended	Shift-JIS	01172/01027	01001/00300	_cs939.gnt, .obj and .dll

7.9.4.2 Files Supplied for the _CODESET Program

Mapping files are supplied for the each of the preconfigured ASCII/EBCDIC converters. These mapping files were used to create the preconfigured converters, and they show the internal mapping between ASCII and EBCDIC for each of the character sets.

For example, the _c939.gnt, .obj and .dll modules were created using the files map939.a2e and map939.e2a. They show the internal mapping between ASCII and EBCDIC for the _cs939 character set.

The DBCS codepage support module, mftrnsdt.gnt is also supplied including an additional Shift-JIS data file, dbcs.e2j.

7.9.5 Converting ASCII/EBCDIC Data Using the CODESET Program

For normal application needs, you use the CHARSET directive to convert ASCII and EBCDIC. However, under some circumstances, you might want to call the _CODESET program yourself to convert them.

7.9.5.1 Invoking the _CODESET Program

To invoke the _CODESET program, use the following command line:

Syntax:

call "_CODESET" using function-code, text-length, text-string

Parameters:

`function-code`	pic 9(2) comp-x.
`text-length`	pic 9(9) comp-x.
`text-string`	pic x(n).

On Entry:

function-code

Indicates the conversion to do:

0	convert EBCDIC to ASCII
1	convert ASCII to EBCDIC
2	use the full 256-byte EBCDIC to ASCIIconversion table, and ignore text-length
3	use the full 256-byte ASCII to EBCDIC conversion table, and ignore text-length
4	convert pure DBCS EBCDIC to pure DBCS PC encoding
5	convert pure DBCS PC encoding to pure DBCS EBCDIC

text-length Length of text-string.

text-string Text string to convert.

On Exit:

text-string Converted text.

Comments:

Note that _CODESET program cannot work from a program that is compiled with the reserved word directives for one of the mainframe COBOL compilers (for example, OSVS for OS/VS COBOL and VSC2 for VS COBOL II) because it uses COMP-X data types, which are not supported on the mainframe.

UNIX:
On UNIX, use the _CODESET program for converting single-byte data only, using a function-code of 0 or 1

DOS, Windows and OS/2:
On DOS, Windows and OS/2, specifying a function-code of 0 or 1 converts DBCS characters in text-string only if the DBCS characters are enclosed within Shift-Out and Shift-In characters. Shift-Out has the value 14 (decimal) and Shift-In has the value 15 (decimal).

DOS, Windows and OS/2:
On DOS, Windows and OS/2, specifying a function-code of 4 or 5 converts strings containing only DBCS characters without the need to include Shift-Out and Shift-In characters in text-string.

DOS, Windows and OS/2:
The DBCS PC encoding schemes supported are:

Shift-JIS in a Japanese environment
either BIG-5 or IBM 5550 in a Traditional Chinese environment
GB in a Simplified Chinese environment
KS-Code in a Korean environment

7.9.5.2 Configuring CODESET with the Codecomp Utility

16-bit and UNIX:
On the 16-bit and UNIX systems, the Codecomp utility is available for reconfiguring the _CODESET program for single-byte characters.

The Codecomp utility can perform two functions:

Create mapping files ASCII to EBCDIC and EBCDIC to ASCII from the current _CODESET source.
Modify the _CODESET source using as input one or more text files containing the required mappings of ASCII to EBCDIC and EBCDIC to ASCII.

DOS, Windows and OS/2:

To modify the _CODESET source on DOS, Windows and OS/2, you need to copy the file codeset.cpy from $COBDIR to your working directory before using Codecomp.

UNIX:

To modify the _CODESET source on UNIX you need to copy the file codeset.cpy from $COBDIR/src/codeset to your working directory before using Codecomp.

16-bit:
To use the Codecomp utility on 16-bit systems, enter:

codecomp [/p] name-1 [name-2]

UNIX:
To use the Codecomp utility on UNIX, enter:

codecomp [-p] name-1 [name-2]

Where:

/p or -p Specifies that the current mapping tables are to be output to file. Without this option _CODESET is configured using the specified mapping file or files.

name-1, name-2

If the p option is specified, the names of the mapping files to receive the current configuration.

If the p option is not specified, the mapping files containing the configuration.

The file extensions specify the direction of mapping within the file as follows:

.a2e ASCII to EBCDIC mapping

.e2a EBCDIC to ASCII mapping

If you specify one file only (without the p option), the inverse of the table in this file is implicitly taken as the second table. In this case, the second column of the table in the single input file must contain all values from 0 to 255.

7.9.5.2.1 Format of the Mapping Files for Codecomp

The format for a mapping file is as follows:

     \$                       identifying-string 
     ; date/time 
     ;             00        corresponding-value
                   01              .... 
                   02              .... 
                   .               .... 
                   .               ....

When creating or editing a mapping file, follow these rules:

Enter one mapping per line with space(s) and/or tab(s) separating the columns. The entries do not have to be in any ordered sequence and blank lines are allowed.
identifying-string is mandatory and is denoted by the dollar sign (\$) as the first non-space on the line and should be the first non-comment line in the file. Failure to conform to this causes an error, and Codecomp terminates with no change to _CODESET.
Include a comment on a line by starting it with a semicolon (;).
Specify a complete mapping (that is all values from 0 to 255 are mapped to some value). If not, there is no change to the _CODESET source.
An attempt to change mappings hard coded into run-time systems and native code generators causes a WARNING message to be displayed although reconfiguration of _CODESET proceeds. (These changes occur only in _CODESET, not in the other systems.)
Map the currency-sign, decimal-point, and comma characters in such a way that there are no clashes; that is, map these three characters to three distinct characters. If you do not, you receive a warning that numeric de-editing will not work, and configuration of _CODESET continues.

The following mappings are valid:

1:1	legal
many:1	legal only if 2 input files specified
1:many	illegal
many:many	illegal

7.9.5.2.2 Example of Configuring _CODESET Using Codecomp

When you have modified the _CODESET source with the new mapping files, you then need to recompile it, and put the compiled version in the appropriate library or system library, and if appropriate produce a new .dll file.

To configure the _CODESET program using Codecomp:

Backup your existing _CODESET source. On the 16-bit system, this comprises your utils.lbr and codeset.dll. On UNIX systems, this comprises your COBOL system library, libcobol.a.
Create mapping files from the _CODESET source. Copy the file codeset.cpy from $COBDIR (on DOS, Windows or OS/2) or $COBDIR/src/codeset (on UNIX), then enter the command:
```
16-bit : 
codecomp /p map.a2e map.e2a
```
```
UNIX: 
codecomp -p map.a2e map.e2a
```
where the current mappings from ASCII to EBCDIC are in map.a2e and EBCDIC to ASCII in map.e2a.
Edit and modify the mapping files as required.
Modify the source, codeset.cpy, by entering:
```
codecomp map.a2e map.e2a
```
Rename codeset.cbl to _CODESET.cbl.

16-bit:
On 16-bit systems, recompile _CODESET and add the compiled version to the library as follows:

Compile the _CODESET source to generated code format, by entering:
```
cobol _codeset.cbl omf(gnt);
```
Copy _codeset.gnt into the lbr subdirectory of your COBOL system directory and make the lbr subdirectory your local directory.

Place _codeset.gnt into utils.lbr, by entering:

run library _codeset.gnt utils.lbr = utils.lbr

Delete _codeset.gnt to tidy up.

16-bit:
On the 16-bit system, recompile _CODESET and produce a .dll file as follows:

Compile the _CODESET source to object format, by entering:
```
cobol _codeset.cbl obj deffile;
```

Edit the _codeset.def file produced, by changing the line:

library _codeset initinstance terminstance

to:

library codeset initinstance terminstance

Produce codeset.dll, by entering:

cbllink -d -o codeset.dll _codeset.obj _codeset.def

Copy codeset.dll into your \$COBDIR\exedll directory replacing the old version.

16-bit:
On 16-bit systems, you can also create a statically or dynamically linked executable from the compiled _codeset.obj file. To do this, you need to link in the DBCS support module mftrnsdt.obj.

UNIX:
On UNIX systems, compile and create a statically linked version of _CODESET, as follows:

Compile the _CODESET source to object format, by entering:
```
cob -xc _CODESET.cbl
```
Put this object file into the COBOL system library, by ensuring you are logged in as root and entering:
```
ar rv $COBDIR/coblib/libcobol.a _CODESET.o
```

7.9.6 Configuring the _CODESET Program Manually

You can manually reconfigure the _CODESET program for single-byte characters, by amending and recompiling the source file and building the new module into the COBOL system.

The source and copyfile are provided in src subdirectory within your COBOL system directory. You change the 256-byte conversion tables in the source. When amending these conversion tables, you must use the binary value of the source character+1 to index into the table. The character at this index point is the target character.

When you have modified the _CODESET source, you need to recompile it, create an executable version in some form, and package the executable version appropriately. Do this in the same way as if you had used the Codecomp utility to modify _CODESET.

Migrating Applications

National Language Support (NLS) Demonstration

CBL_NLS_CLOSE_MSG_FILE

Syntax:

Parameters:

On Entry:

On Exit:

Comments:

See also:

CBL_NLS_COMPARE

Syntax:

Parameters:

On Entry:

On Exit:

Comments:

See also:

CBL_NLS_INFO

Syntax:

Parameters:

On Entry:

On Exit:

Comments:

See also:

CBL_NLS_OPEN_MSG_FILE

Syntax:

Parameters:

On Entry:

On Exit:

Comments:

See also:

CBL_NLS_READ_MSG

Syntax:

Parameters:

On Exit:

Comments:

See also:

Syntax:

Parameters:

On Entry:

On Exit:

Comments: