Internationalization Support

This chapter describes the internationalization support provided by Server Express.

NLS provides a means of adapting your program to the character code set, collating sequence, and editing symbols associated with a particular country, without your program knowing these in advance. This facility is also useful in English-speaking countries for handling codepages, collating upper-case and lower-case letters correctly and selecting the appropriate currency symbol.

When you run a program compiled with the NLS directive, it behaves according to the language, country and character set specified in the language environment variable.

7.3 Double-Byte Character Set (DBCS) Support

DBCS support enables you to create native language applications that use double-byte character strings, for countries such as Japan, China and Korea.

Server Express includes some support for handling double-byte characters. A fully DBCS-enabled and localized version of Server Express is available in Japan.

You can create DBCS applications using existing tools and syntax, such as ACCEPT/DISPLAY with PIC X. However, Server Express also supports additional COBOL syntax used in DBCS environments. This syntax is described in your Language Reference - Additional Topics. To enable this support, use the DBCS, NCHAR or JAPANESE Compiler directive.

The library routine CBL_GET_OS_INFO enables your application to detect the character encoding for Far Eastern countries. See the chapter Library Routines.

7.3.1 DBCS Transparency Support

DBCS transparency support enables users of your applications to enter, store, manipulate, display and print single-byte and double-byte character strings in their native language. In addition, you can use tools such as Animator, Editor and Screens to create and maintain application programs that support your native language.

Editing operations such as backspace, delete and insert, as well as cursor movement, are by byte and not by character. For example, a delete operation needs two keystrokes to delete both bytes of a DBCS character. After the deletion of the first byte (half a DBCS character) the remaining characters may appear corrupted, but if you press the delete key a second time the other byte of the character is deleted with no corruption at all.

However, with some tools and on some terminal environments, editing operations and cursor movement are by DBCS character and not by single bytes.

DBCS transparency support is restricted in the following tools, which are described in your Utilities Handbook:

7.3.2 Double-Byte Line Draw and Graphics Characters

The routines CBL_GET_SCR_LINE_DRAW and CBL_GET_SCR_GRAPHICS support the line draw and graphics characters used in all the environments supported by Object COBOL. See the chapter Generic Line Drawing in your Programmer's Guide to User Interfaces.

Some demonstration programs supplied with this system contain IBM PC encoded line draw characters. Under some environments, you need to amend these characters in the COBOL source before using these programs.

7.3.2.1 Additional Syntax to Support DBCS

Object COBOL supports the following items, which you can use in DBCS applications in addition to the standard PIC X support:

Note: When performing a Format 4 or Format 5 ACCEPT statement into an identifier of class DBCS, you must enter double-byte data only. See the chapter Double-Byte Character Set Support in your Language Reference - Additional Topics.

7.3.3 Support for Translation between ASCII and EBCDIC

Object COBOL provides the CHARSET Compiler directive and the _CODESET program for converting between ASCII and EBCDIC. It also provides some preconfigured _CODESET programs supporting ASCII and EBCDIC codepages for several Far Eastern languages and German.

7.4 Enabling Internationalization

Server Express relies on UNIX to provide the appropriate national language tables for the language, country and character set that you specified. These are known as locales.

7.5 Setting Up the Environment for NLS and DBCS

Before you run a program that uses NLS or that uses DBCS support on UNIX, you must define your locale in the language environment variable. The locale is the combination of language, territory and codepage.

The locale might already be defined for you. If it is not, you define it as follows:

Refer to your operating system documentation for valid territory and codepage values.

Example:

The following example on SCO UNIX or AIX specifies the the language as French, the territory as France, and uses the the ISO 8859-1 codeset:

7.6 Creating Programs That Use NLS

To use the NLS facility in your program, you must specify the NLS Compiler directive when you compile your program. By default, the NLS directive is not set. Throughout the rest of this chapter, a program compiled using the NLS directive is referred to as NLS-enabled.

You can NLS-enable any program in your application. Thus some programs within an application might use NLS facilities, while others might not. See the section Mixing Programs with and without NLS later in this chapter for details.

7.7 Running Your NLS-Enabled Program

The run-time system initializes the NLS facility only once during an application's run; that is, when it encounters the first NLS-enabled program. It uses the LANG environment variable to determine the language environment to set up for this program. The run-time system uses the same language environment for any subsequent programs that NLS-enabled. See the section Mixing Programs With and Without NLS later in this chapter for details.

If an error occurs during initialization, for example the language specified in the LANG environment variable is not supported, the run-time system issues an error message and terminates its run. Full details on NLS error messages are in your Error Messages.

You must not use indexed files with variable length records when the NLS directive is set.

You cannot use indexed files created with one collating sequence in a file language environment that uses a different collating sequence. If you do and you specified file status bytes in your program, the COBOL system returns the file status 9/45. If you did not specify file status bytes in your program, the run-time system issues an error message.

When you run an NLS-enabled program, the language specified in the LANG environment variable defines the behavior of:

Other language-dependent features, such as the symbols used to denote the decimal points and the currency character , also appear in the format of the specified language. However, in languages that define the currency sign as trailing, this has no effect and the usual COBOL rules are observed.

Certain NLS definitions have characters other than the ASCII characters 0-9 defined as numerics. You cannot use the ACCEPT syntax to enter such characters into numeric picture strings, nor can they be used in numeric operations. In all NLS operations in the COBOL environment, a numeric item must be formed only from the ASCII digits 0-9, with or without the ASCII operational signs "+" or "-". There is no means of automatically converting the NLS representation to the ASCII equivalent.

It is possible to enter European modifying characters into numeric ACCEPT fields. These are accepted as zero.

Note that the values assigned to figurative constants, for example LOW-VALUES, are not changed by using NLS features.

Unpredictable results occur in the following circumstances when you use the NLS facility:

You can also use the Adis Flip Case Control key when using NLS characters. However, if you attempt to convert a European character to upper case using this key, the character will be replaced by spaces if it has no upper case equivalent.

7.7.1 String Comparisons

For NLS-enabled programs, the run-time system invokes the CBL_NLS_COMPARE routine to compare strings. This routine is also called for alphanumeric comparisons if the program uses the intrinsic functions Max, Min, Ord-max or Ord-min on alphanumeric operands.

During a MOVE operation of one alphanumeric item to another which is longer, padding with spaces occurs. Similar padding is also implied before the comparison of two such items. In both cases, an ASCII space is assumed.

7.7.2 Class Condition Tests

For NLS-enabled programs, the run-time system invokes locale-specific tests when it needs to carry out class condition tests. These tests determine if a string of information is in ALPHABETIC, ALPHABETIC-UPPER, ALPHABETIC-LOWER or NUMERIC format. The numeric test always tests that all characters are in the range of ASCII 0-9.

7.7.3 Key Comparisons in Indexed Sequential Files

The run-time system uses the CBL_NLS_COMPARE routine for key comparisons associated with NLS files.

If the logical filename of an indexed sequential file is preceded by the five characters "%NLS%", the file is treated as being keyed according to the collating sequence specified in the LANG environment variable. This applies whether "%NLS%" precedes the logical filename before the filename is resolved using environment variables or after it is resolved. This also applies only if the program that OPENs the file was compiled with the NLS directive set.

If the logical filename is preceded by "%NLS%" after it is resolved using environment variables, but the program that OPENs the file was compiled without the NLS directive set, the OPEN fails and returns a run-time system error.

If your file contains variable length records, the NLS collating sequence is not used.

For details on assigning logical filenames to be resolved using environment variables, see the Filenames chapter in your File Handling.

7.7.4 SORT and MERGE Comparisons

If an NLS-enabled program performs a SORT or MERGE operation, it automatically uses NLS key comparisons.

The logical filename of the work file must be preceded by the characters "%NLS%" in the ASSIGN statement, and the program that opens it must be NLS-enabled. This is as described in the previous section for indexed sequential files.

The run-time system invokes the CBL_NLS_COMPARE routine for all key comparisons used in SORT or MERGE operations.

7.7.5 Case Conversion

If an NLS-enabled program performs a case conversion, either by using the intrinsic functions Upper-case and Lower-case, or by calling the library routines CBL_TOUPPER and CBL_TOLOWER, the run-time system invokes a routine to change the case of national characters correctly.

7.7.6 Collating Sequence Operations

For NLS-enabled programs, a collating sequence appropriate to the locale is used. If the program performs a collating sequence operation by using the intrinsic functions Char and Ord, then this special collating sequence is used.

7.7.7 Editing and De-editing Moves

If an NLS-enabled program performs an editing or de-editing move, then the decimal point and thousands separator appropriate to the national language locale are used. Also, the currency symbol used is the first character of the currency symbol of the locale territory.

7.7.8 Intrinsic Functions Numval and Numval-c

If an NLS-enabled program attempts to convert a number or a monetary value in a display item to a numeric value by using the intrinsic functions Numval or Numval-c, then the run-time system uses the decimal point and thousands separator appropriate to the locale. Also, if no second argument is supplied for Numval-c, then the currency symbol used is the currency symbol of the locale territory.

7.7.9 User Interfaces

If you require information concerning the language environment you are using, you can access the NLS routines supported by your COBOL system. Full details on all of these routines can be found later in this chapter.

You also can access the UNIX operating system routines, such as nl_langinfo(), though these routines are not portable to other environments. Full details on these routines can be found in your operating system documentation.

7.8 Mixing NLS-Enabled and Non-NLS-Enabled Programs

If an application comprises more than one program, you only need to set the NLS Compiler directive for those programs that you want to be NLS-enabled. Both NLS-enabled programs and non-NLS-enabled can call programs NLS-enabled and non-NLS-enabled programs.

An NLS-enabled program has no particular language or locale associated with it. It uses the application locale, which is initialized along with the NLS facility when the first NLS-enabled program in the application is called.

Once the application locale is set, it is used for all subsequently called NLS-enabled programs. You cannot change the language environment after this, even if you change the setting of the LANG environment variable. You must therefore ensure that all NLS-enabled programs use the same language.

The library routines for NLS can be called from any NLS-enabled program. Programs in the application that are not NLS-enabled can also call these routines, but only after the National Language Support module has been loaded and initialized by calling an NLS-enabled program.

You can pass parameters from an NLS-enabled program a non-NLS-enabled program. However, parameters that depend on the language environment in which they are created retain their format regardless of the language environment in which they are used. If you attempt to use a parameter created in an NLS-enabled program in a non-NLS-enabled program, the result might not be as you expect. We recommend that you do not attempt to pass such parameters to programs other than those that have the same language environment.

7.9 Converting Between ASCII and EBCDIC

Normally you use data that is in ASCII format, although with Server Express you can also use data that is in EBCDIC format. Storing data in EBCDIC format enhances the testing of programs targeted to run on the mainframe and eases data exchange between the mainframe and your system.

If you want to use EBCDIC data on your system, make sure your programs are in ASCII format, and then compile them with the CHARSET(EBCDIC) directive. You can then use the resulting compiled programs with unconverted data files.

By default, Object COBOL supports ASCII/EBCDIC conversions of a standard US codepage based on PC codepage 437. Additional preconfigured modules are available for converting various Far Eastern languages and German. To convert EBCDIC correctly in these environments, you need to replace the default module with the preconfigured module for the relevant character set.

7.9.1 Converting ASCII/EBCDIC Data Using the CHARSET Directive

You can convert your data between the ASCII and EBCDIC character sets by using the CHARSET directive. Compile your program with the CHARSET directive as follows:

The CHARSET directive also sets the related directives NATIVE and SIGN. If you have CHARSET(EBCDIC) set, you also have the behavior of SIGN(EBCDIC) and NATIVE(EBCDIC).

Both the called and the calling programs must be compiled with the same setting for the CHARSET directive.

Compiling with the EBCDIC option results in a Data Division that looks exactly the same as it would when compiling with a mainframe compiler running in an EBCDIC environment. This means that DISPLAY type values are in EBCDIC, and COMP values are binary fields, as normal. Similarly, any alphanumeric literal in the Procedure Division (for example, MOVE "HELLO" TO FRED or MOVE ALL "X" TO FRED) is in EBCDIC. The procedural code expects all DISPLAY type data, including numeric data, to be in EBCDIC.

When reading a file, if portions of the record represent characters while other portions are binary, it is necessary to have a suitable record definition in the COBOL program, just as it is on the mainframe. The intention is that data accessible to the user (such as file records and Working-Storage) should always appear exactly as on the mainframe.

The only concessions to an ASCII environment when using the EBCDIC support are as follows:

7.9.2 Current Limitations on ASCII and EBCDIC

7.9.3 Collating Sequences for ASCII and EBCDIC

The collating sequences for ASCII and EBCDIC are different. If you convert from EBCDIC to ASCII, comparisons might no longer give the same results. ASCII numerics and upper and lower case orderings are different from EBCDIC. For example, the ASCII numbers (hexadecimal 30 to 39) come before uppercase letters (hexadecimal 41 to 5A), followed by lower-case letters (hexadecimal 61 to 7A).

EBCDIC lower-case letters sort first (hexadecimal 81 to 89, 91 to 99, A2 to A9), followed by uppercase letters (hexadecimal C1 to C9, D1 to D9, E2 to E9), followed by numbers (hexadecimal F0 to F9). EBCDIC has special characters throughout the range.

For example, in the following code, the results are different, depending on whether you are executing in an ASCII or EBCDIC environment.

When a program collating sequence is in effect, alphanumeric comparisons (including group comparisons) that contain numeric data are likely to produce meaningless results.

Without an EBCDIC collating sequence, field-a is less than field-b. With an EBCDIC collating sequence, however, the x"4F" is mapped onto the EBCDIC "O", which is x"D6". The underscore character, x"5F", is mapped onto the EBCDIC character x"6D". With theNATIVE(EBCDIC) Compiler directive, field-a is greater than field-b.

Some of the areas affected by using different collating sequences are IF statements, SORT statements, writing data to indexed files, and any areas where you assume data to contain the EBCDIC hexadecimal values (for example, assuming space = X"40").

7.9.4 Converting ASCII/EBCDIC Data Using the CODESET Program

For normal applications, you use the CHARSET directive to convert ASCII and EBCDIC. However, under some circumstances, you might want to call the _CODESET program yourself to convert them.

Syntax:

Parameters:

On Entry:

On Exit:

Comments:

Note that _CODESET program cannot work from a program that is compiled with the reserved word directives for one of the mainframe COBOL compilers (for example, OSVS for OS/VS COBOL and VSC2 for VS COBOL II) because it uses COMP-X data types, which are not supported on the mainframe.

Use the _CODESET program for converting single-byte data only, using a function-code of 0 or 1

7.9.5 Configuring _CODESET with the Codecomp Utility

The Codecomp utility enables you to reconfigure the _CODESET program for single-byte characters.

_CODESET operates by calling a program that contains the mapping configuration. This program has the name CSnnnn.ext, where nnnn is a numeric identifier, and ext indicates the type of executable. You can create your own CSnnnn.ext files and specify that mapping information should be obtained from them. You specify which files to use using the MFCODESET environment variable. The steps required are:

Use the Codecomp utility to specify the mapping required. This creates a .cpy file that is used in the CSnnnn.ext file.
Recompile the CSnnnn.ext file

The Codecomp utility:

Creates mapping files of the currently used EBCDIC to ASCII translation scheme
Specifies new translation schemes

7.9.5.1 Creating Mapping Files

You can use the Codecomp utility to create mapping files of the currently used EBCDIC to ASCII translation scheme. The mapping files created are based on the current setting of the MFCODESET environment variable. To create the mapping files, enter:

codecomp -p filename-1 [filename-2]

where the parameters are:

`-p`	Specifies that the current mapping tables are to be output to file.
`name-1, name-2`	The names of the mapping files to receive the current configuration. Filename extensions specify which file contains which mapping table.Give one of these files the extension .a2e for ASCII to EBCDIC mapping, the other .e2a for EBCDIC to ASCII mapping.

If you do not specify at least one file, an error is returned. All command-line errors result in a help message specifying the correct format for the command-line.

7.9.5.2 Specifying New Translation Schemes

You can use the Codecomp utility to specify new translation schemes. To do this, enter:

codecomp name-1 [name-2]

where the parameters are:

name-1, name-2

The mapping files containing the mapping configuration. Filename extensions specify which file contains which mapping table. Give the file that contains the ASCII to EBCDIC mapping the extension .a2e. Give the file that contains the EBCDIC to ASCII mapping the .e2a extension.

If you specify only one file, the inverse of the table in this file is implicitly taken as the second table. In this case, the second column of the table in the single input file must contain all values from 0 to 255.

If you do not specify at least one file, an error is returned. All command-line errors result in a help message specifying the correct format for the command-line.

The result of using this command is a copyfile called codeset.cpy that contains the required EBCDIC to ASCII translations. This .cpy file is used by the CSnnnn.ext file. You create a CSnnnn.ext file for a specific locale. For example, to create a translation table for country X, you could create a file called CS2001.cbl, and compile it.

A template translation file is supplied by Micro Focus; this is called CSnnnn.cbl, and can be found in the directory $COBDIR/src/codeset. To create your own translation tables, you:

Copy CSnnnn.cbl to a file with the name you require; for example:
```
cp $COBDIR/src/codeset/CSnnnn.cbl CS2001.cbl
```
Compile your CSnnnn.cbl to create an executable file; for example:
```
cob -u CS2001.cbl 
```
This creates the file CS2001.gnt.
Copy your executable file to the directory $COBDIR/dynload.

You can then specify which translation tables to use through the MFCODESET environment variable. This has the format:

MFCODESET=nnnn

where nnnn is a value from 2000 to 9999. Values below 2000 are reserved for Micro Focus use.

MFCODESET directs _CODESET to call the program that contains the correct mapping tables. For example, if you have created the file CS2001.gnt and copied it to $COBDIR/dynload, then:

MFCODESET=2001

would direct _CODESET to use the translation tables in the executable CS2001.gnt

7.9.5.3 Format of the Mapping Files for Codecomp

The format for a mapping file is as follows:

     $ identifying-string 
     ; date/time 
     ;             00        corresponding-value
                   01              .... 
                   02              .... 
                   .               .... 
                   .               ....

When creating or editing a mapping file, follow these rules:

Enter one mapping per line with space(s) and/or tab(s) separating the columns. The entries do not have to be in any ordered sequence and blank lines are allowed.
identifying-string is mandatory and is denoted by the dollar sign ($) as the first non-space on the line and should be the first non-comment line in the file. Failure to conform to this causes an error, and Codecomp terminates with no change to _CODESET.
Include a comment on a line by starting it with a semicolon (;).
Specify a complete mapping (that is all values from 0 to 255 are mapped to some value). If you do not, codeset.cpy is not created.
Map the currency-sign, decimal-point, and comma characters in such a way that there are no clashes; that is, map these three characters to three distinct characters. If you do not, you receive a warning that numeric de-editing will not work, but configuration continues.

The following mappings are valid:

1:1	legal
many:1	legal only if 2 input files specified
1:many	illegal
many:many	illegal

7.9.5.4 Example of Configuring _CODESET Using Codecomp

To configure the _CODESET program to use different translation tables:

Create mapping files from the _CODESET source, by entering:
```
codecomp -p map.a2e map.e2a
```
where the current mappings from ASCII to EBCDIC are in map.a2e and EBCDIC to ASCII in map.e2a.
Edit and modify the mapping files as required.
Modify the source, codeset.cpy, by entering:
```
codecomp map.a2e map.e2a 
```
Create a value for your new locale-specific translations, and copy the CSnnnn.cbl source file to a file with a name that matches that value. For example, if you decide that value 2001 is to be used for country X, copy CSnnnn.cbl to CS2001.cbl:
```
cp $COBDIR/src/codeset/CSnnnn.cbl CS2001.cbl 
```
Create an executable version of CS2001.cbl. For this example, create a generated file:
```
cob -u CS2001.gnt
```
Copy CS2001.gnt to your $COBDIR/dynload directory.
Set the value of MFCODESET to the number part of the executable filename; for example:
```
MFCODESET=2001
```

7.9.6 Configuring the _CODESET Program Manually

You can manually reconfigure the _CODESET program for single-byte characters, by amending and recompiling the source file and building the new module into the COBOL system.

The source and copyfile are provided in the src subdirectory within your COBOL system directory. You change the 256-byte conversion tables in the source. When amending these conversion tables, you must use the binary value of the source character+1 to index into the table. The character at this index point is the target character.

When you have modified the _CODESET source, you need to recompile it, create an executable version in some form, and package the executable version appropriately. Do this in the same way as if you had used the Codecomp utility to modify _CODESET.

ISO646 National Variants:	French, German, Swedish, Danish, Italian, Spanish, Portuguese
ISO8859/1:	West European languages
IBM Personal Computer 8-bit Character Set:	West European languages

`function-code`	pic 9(2) comp-x.
`text-length`	pic 9(9) comp-x.
`text-string`	pic x(n).

Chapter 7: Internationalization Support

7.1 Introduction

7.2 National Language Support (NLS)