PreviousSegmentation Micro Focus Extensions for Double-Byte Character SupportNext"

Chapter 4: Double-Byte Character Set Support

Many of the world's languages use sets of characters that run into the thousands. Most computers use 8-bit bytes, and assign a different 8-bit code to represent each character; this scheme can represent no more than 256 different characters.

Ideally a COBOL programmer should not need to be aware of the internal code used to represent characters. However, in practice some features of the internal code can affect the source programmer, and this limitation to 256 different characters is one of the most restricting of these.

For this reason the Double-Byte Character Set (DBCS) is provided. In this scheme each character is represented by a 16-bit code, each character occupying a pair of adjacent bytes. This scheme can represent thousands of different characters.

The assignment of DBCS character codes to characters varies from country to country.

The 8-bit code used by your COBOL system is the American Standard Code for Information Interchange (ASCII). In this chapter this will be referred to as the Single-Byte Character Set (SBCS).

Double-Byte Character set support is sensitive to the DBCS Compiler directive.

See also the chapter Micro Focus Extensions for Double-Byte Character Support, primarily for Japanese language support.

4.1 DBCS Data

The DBCS Compiler directive makes your COBOL compiler recognize two data categories in which data is stored in DBCS. It does not prevent the use of other data categories; thus you can still use those data categories in which data is stored in SBCS.

Provided you have the necessary hardware support, DBCS data items used in input and output will be recognized and their data displayed and accepted correctly on such devices as screens, keyboard, printers, et cetera.

4.2 Roman Script in DBCS

The character set that can be represented by SBCS is based on the Roman alphabet plus some other characters. In some countries the DBCS character codes also include codes for many of these characters.

On some hardware the character displayed is visibly different according to whether the character is stored in SBCS or DBCS; for example on some screens the DBCS code for a letter causes it to be printed larger than does its SBCS code.

4.3 Multivendor Integration Architecture Support

Programs written to the NTT Multivendor Integration Architecture (MIA) Support are accepted by the COBOL compiler, using the DBCS and CURRENCY-SIGN"92" directives.

4.4 Source Programs

DBCS characters can be used in literals (since literals are data), in comments and comment-entries, and in user-defined words. Otherwise the DBCS directive does not change the range of characters that can be used in source programs - the program is still written using the COBOL character set (see the chapter Concepts of the COBOL Language in your Language Reference).

4.5 Language Extensions

There are extensions to the PICTURE and USAGE clauses to define items that are to contain DBCS data. A new format of literal is required for DBCS data.

There are additional rules for various options, clauses and statements to define the behavior of DBCS data.

Except where otherwise stated, all the rules and features of COBOL remain applicable when DBCS is in use. The following sections give only the additional rules and formats pertaining to DBCS.

4.6 Comments and Comment-entries

SBCS and DBCS characters can be mixed freely in comments and comment-entries.

4.7 User-defined Words

Either SBCS or DBCS characters can be used in user-defined words for: Alphabet-name, Class-name, Condition-name, Data- name/Identifier, Record-name, File-name, Index-name, Mnemonic-name, Paragraph-name, Section-name, and Symbolic-character.

SBCS and DBCS characters can be freely mixed in user-defined words. Where a character exists in both the DBCS and SBCS character sets, its DBCS and SBCS representations will not be regarded as equivalent. See the section Roman Script in DBCS earlier in this chapter.

4.8 Spaces

Spaces in data of class DBCS will be represented by the DBCS code for space. A space character represented by a 2-byte code is referred to as a DBCS space.

The values assigned to a DBCS space are sensitive to the DBCS and DBSPACE Compiler directives.

4.9 Data Items

4.9.1 DBCS Data Items

There is a class of data additional to the classes described in the chapter Concepts of the COBOL Language in your Language Reference: it is called DBCS. It includes two data categories: DBCS and DBCS edited.

A data item of class DBCS is described by using the USAGE DISPLAY-1 clause. An item with this clause can have only the characters "G" and "B" in its PICTURE character-string. A " G" represents a DBCS character position; "B" is an editing character, and indicates a position that will always have a DBCS space inserted in editing. An item whose PICTURE character-string is all "G"s is of category DBCS; an item whose PICTURE character-string contains both "G" s and "B"s is of category DBCS edited.

Note that each "G" or "B" represents one 2-byte character position. Except where otherwise stated, the length of the data item for all purposes is the number of "G" s and "B"s in its PICTURE character-string.

For reference modification, the leftmost-character-position and length specify the number of DBCS characters, not bytes.

Data items of class DBCS can be used wherever data items of class alphanumeric can be used, subject to rules and exceptions given in the appropriate places in this chapter.

4.9.2 Mixed Data Items

DBCS characters can be included in data stored in data items of category alphanumeric. In such data, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.

On input and output both the SBCS and the DBCS codes will be recognized. The first byte of a DBCS code is never a valid SBCS code; hence the two can be used together without confusion. But in operations within the program the data will be treated as ordinary alphanumeric data. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.

The length of the data item for all purposes is its length in bytes.

4.10 Literals

4.10.1 DBCS Literals

There is a third type of literal in addition to the nonnumeric and numeric literals described in the chapter Concepts of the COBOL Language in your Language Reference, the DBCS literal.

A DBCS literal is a character-string delimited at both ends by quotation marks or apostrophes, with the beginning delimiter preceded by a "G". It can consist of any characters in the computer's DBCS character set. It can be up to 28 DBCS characters in length. It cannot be continued across lines.

Whether quotation marks or apostrophes are used, the presence of that delimiter within a DBCS literal can be represented by two contiguous occurrences. The presence of the character that is not serving as the delimiter is represented by a single occurrence. The value of a DBCS literal in the object program is the string of characters itself, except:

  1. The initial G and the delimiters are excluded, and

  2. Each embedded pair of contiguous delimiter characters represents a single character.

4.10.1.1 Category of DBCS Literals

All DBCS literals can be used wherever nonnumeric literals can be used, subject to rules and exceptions given in the appropriate places in this chapter.

4.10.2 Mixed Literals

DBCS characters can be included in nonnumeric literals. A nonnumeric literal that includes DBCS characters is called a mixed literal. In such a literal, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.

On output both the SBCS and the DBCS codes will be recognized. The first byte of a DBCS code is never a valid SBCS code; hence the two can be used together without confusion. But in operations within the program the literal will be treated as an ordinary nonnumeric literal. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.

A nonnumeric literal is of category alphanumeric, not DBCS, regardless of whether it includes DBCS characters.

A mixed literal cannot be continued across lines.

This restriction has been removed.

4.10.3 Figurative Constants

If a figurative constant is used where only a DBCS literal is allowed (according to the rules concerning classes and categories given in the appropriate places in this chapter), it is a DBCS literal. Each space in this literal is a DBCS space.

Only the figurative constant SPACE(S) can be a DBCS literal.

4.10.4 The "N" literal

Another format of literal, equivalent to the DBCS literal, is used in COBOL/370 and the MIA COBOL specification.

General Format

Syntax Rules

  1. An N-literal can contain no more than 18 DBCS-characters, and can not be split over two lines.

  2. An N-literal can contain only double-byte characters for your computer's Double Byte Character Set.

  3. Any double-byte quotation marks used in the literal should be written twice. For example, in order to express a double-byte quotation mark in the literal, you should write:

    N"ABC""DEF"

  4. N-literal specification and behavior can be modified in exactly the same way as G-literals using the APOST Compiler directive to replace a quotation (double-line) by an apostrophe (single-line) character.

General Rules

  1. The N-literal can be used in conjunction with ALL to make a figurative constant (see the chapter Concepts of the COBOL Language in your Language Reference).

  2. All characters must be double-byte characters.

4.11 Program Structure

4.11.1 The END PROGRAM Header

Syntax Rules

  1. Program-name must not contain DBCS characters.

    This restriction is removed.

4.12 Identification Division in the DBCS Module

4.12.1 The PROGRAM-ID Paragraph

Syntax Rules

4.13 Environment Division in the DBCS Module

4.13.1 The SOURCE-COMPUTER Paragraph

Syntax Rules

  1. Source-computer-name can contain DBCS characters.

4.13.2 The OBJECT-COMPUTER Paragraph

Syntax Rules

  1. Object-computer-name can contain DBCS characters.

4.13.3 The SPECIAL-NAMES Paragraph

Syntax Rules

  1. In the CURRENCY SIGN clause, literal-6 must not be a DBCS literal and must not be "G"

    or "N".

  2. In the ALPHABET clause, literal-1, -2, and -3 must not be DBCS literals.

  3. In the CLASS clause, literal-4 and -5 must not be DBCS literals.

4.13.4 The FILE-CONTROL Paragraph

Syntax Rules

  1. In the ASSIGN clause, literal-1 must not be a DBCS literal and external-file-reference must not contain DBCS characters.

    This restriction is removed.

4.14 Data Division in the DBCS Module

4.14.1 The JUSTIFIED Clause

General Rules

  1. The JUSTIFIED clause can be used with DBCS data items.

4.14.2 The PICTURE Clause

General Rules

There are two additional categories of data that can be described with a PICTURE clause: DBCS and DBCS edited. Both these categories must be described as USAGE IS DISPLAY-1.

These categories need not be described as USAGE DISPLAY-1.

DISPLAY-1 is optional for PIC N, but not for PIC G items.

4.14.3 Rules for DBCS Data

  1. Its PICTURE character-string can contain only the symbol "G"

    or "N".

  2. Its contents can be any characters in the DBCS character set.

4.14.4 Rule for DBCS Edited Data

Its PICTURE character string can contain any combination of the symbols "G" and "B".

4.14.4.1 Symbols Used

The functions of these symbols are as follows:

G - Each "G" represents a character position which can contain only a DBCS character or a DBCS space.
B - Each "B" represents a character position into which the DBCS space character will be inserted.

N -

Each "N" represents a character position which can contain only a DBCS character or a DBCS space.

Note that each "G" or "B"

or "N"

represents a single double-byte character position.

4.14.5 Editing Rules

The type of editing that can be performed on an item depends on the category to which the item belongs. Table 4-1 Function Names Support, (see the chapter Program Definition in your Language Reference) is extended with the following information:

Table 5-1 : Editing Types for Data Categories

Category
Type of Editing
DBCS None
DBCS Edited Simple insertion "B" only

4.14.5.1 Fixed Insertion Editing

When used in an SBCS item, "B" (space) represents an SBCS space. When used in a DBCS item it represents a DBCS space.

4.14.6 The REDEFINES Clause

Syntax Rules

  1. If either data-name-1 or data-name-2 is of class DBCS, then both must be of class DBCS.

    This restriction is removed.

4.14.7 RENAMES Clause

Syntax Rules

  1. If either data-name-1 or data-name-2 is of class DBCS, then both must be of class DBCS. No THROUGH clause is allowed.

4.14.8 The USAGE Clause

General Format

The General Format is extended by the addition of the following:

Syntax Rules

  1. The PICTURE character-string of a DISPLAY-1 item can contain only "G"s and "B" s. An item whose PICTURE character-string contains "G"s must have a USAGE IS DISPLAY-1 clause.

    Note that therefore the USAGE IS DISPLAY-1 clause makes an item's class DBCS.

    USAGE IS DISPLAY-1 is not required for items whose PICTURE clause contains

    "G"s or

    "N"s.

    Whenever a PICTURE clause contains a "G" the associated item is considered to be of class DBCS.

  2. The BLANK WHEN ZERO clause cannot be used with group or elementary items described as USAGE IS DISPLAY-1. The SYNCHRONIZED clause is ignored.

  3. The usage of a Screen Section data item may be implicitly or explicitly defined as USAGE DISPLAY-1.

General Rules

  1. The USAGE IS DISPLAY-1 clause indicates that the format of the data is DBCS.

4.14.9 The VALUE Clause

Syntax Rules

  1. In a data description entry, if the category of the item is DBCS the literal in the VALUE clause must be of category DBCS. A DBCS literal is allowed only if the category of the item is DBCS or if the category of the item is DBCS edited.

  2. A DBCS literal in the VALUE clause must not exceed the size given by the PICTURE character-string.

4.14.10 CONDITION-NAME Rules

  1. In a condition-name entry, if the associated data item is of category DBCS, all literals in the VALUE clause must be of category DBCS. DBCS literals are allowed only if the category of the associated data item is DBCS.

4.15 Procedure Division in the DBCS Module

4.15.1 Conditional Expressions

4.15.1.1 Relation Conditions

Data items and literals of class DBCS can be used in a relation condition with any relational operator. Each operand must be either a group item or of class DBCS. No conversion, editing, or de-editing is done; no distinction is made between items of category DBCS and items of category DBCS edited.

The operation performed is a nonnumeric comparison. Since there is in general no collating sequence between the characters in a DBCS character set, the collating sequence used is based on the numeric values of the bit patterns representing the characters, interpreted as if they were binary numbers.

Note that if the DBCS character codes include codes for characters in the SBCS character set, there is no guarantee that this collating sequence will order them the same as in SBCS.

Where a character exists in both the DBCS and SBCS character sets, its DBCS and SBCS representations will not be regarded as equivalent. See the section Roman Script in DBCS in this chapter.

The PROGRAM COLLATING SEQUENCE clause has no effect on comparisons involving data items of class DBCS or DBCS literals.

If the operands are of unequal size, comparison proceeds as though the shorter operand were extended on the right by enough DBCS spaces to make them the same size.

4.15.1.2 Class Condition

The class tests for DBCS items are DBCS and KANJI.

DBCS and KANJI class tests are not supported.

4.15.2 Move Operation

All statements that involve moving data between items and/or literals of class DBCS obey the rules given for such moves under the General Rules for The MOVE Statement later in this chapter.

4.15.3 The ACCEPT Statement

Syntax Rules

  1. In Format 2, identifier must not be of class DBCS.

General Rules

  1. In Format 1, DBCS data can be entered only if identifier is of class DBCS.

4.15.4 The CALL Statement

Syntax Rules

  1. Identifier-1 must not be a data item of class DBCS; literal-1 must not be a DBCS literal.

    This restriction is removed.

  2. The program-name given by identifier-1 or literal-1 must not contain DBCS characters.

    This restriction is removed.

General Rules

  1. Each item of class DBCS referenced in the USING clause must correspond to an item of class DBCS in the USING clause of the Procedure Division header of the called program.

4.15.5 The CANCEL Statement

Syntax Rules

  1. Identifier-1 must not be a data item of class DBCS; literal-1 must not be a DBCS literal.

    This restriction is removed.

  2. The program-name given by identifier-1 or literal-1 must not contain DBCS characters.

    This restriction is removed.

4.15.6 The INITIALIZE Statement

General Format

The General Format is extended by the options DBCS,

NATIONAL or NATIONAL-EDITED

as additional alternatives to ALPHABETIC, ALPHANUMERIC, et cetera.

General Rules

  1. Specifying the DBCS,

    NATIONAL or NATIONAL-EDITED option causes

    data items of categories DBCS and DBCS-EDITED to be initialized.

4.15.7 The INSPECT Statement

Syntax Rules

  1. All the identifiers and literals except identifier-2 must be of class DBCS if any one of them is of class DBCS.

General Rules

  1. The count maintained in identifier-2 is of DBCS characters, not bytes.

4.15.8 The MOVE Statement

Syntax Rules

  1. All the identifiers and literals must be of class DBCS if any one of them is.

    Items of class DBCS, category alphanumeric, and category alphabetic can be mixed freely.

  2. In Format 2, the MOVE CORRESPONDING format, if either of a pair of matching items is of class DBCS then they both must be.

General Rules

  1. If the sending field is of class DBCS and a receiving field is of category DBCS Edited, editing is carried out on that receiving field. For any other combination of fields of class DBCS, no conversion, editing, or de-editing is done.

    This rule is relaxed as follows. If a receiving field is of category DBCS edited, it is edited. For any other combination that includes a field of class DBCS, no conversion, editing, or de-editing is done.

  2. If a receiving field is a different size from the sending field, the data is stored in that receiving item and truncated or padded on the right with DBCS spaces.

4.15.9 The SEARCH Statement

Syntax Rules

  1. In Format 2, if identifier-1 is of class DBCS, then the ASCENDING/DESCENDING KEY item defined in its OCCURS clause must be of class DBCS also.

4.15.10 The STOP Statement

Syntax Rules

  1. Literal must not be a DBCS literal.

    This restriction is removed.

4.15.11 The STRING Statement

Syntax Rules

  1. All the identifiers and literals except identifier-4 must be of class DBCS if any one of them is of class DBCS.

General Rules

  1. The relative position indicated by identifier-4 gives the position in DBCS characters.

4.15.12 The UNSTRING Statement

Syntax Rules

  1. All of identifier-1, -2, -3, literal-1, and -2 must be of class DBCS if any one of them is of class DBCS.

General Rules

  1. The relative position indicated by identifier-7 gives the position in DBCS characters.

  2. The count maintained in identifier-6 is of DBCS characters.


Copyright © 2000 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names used herein are protected by international law.

PreviousSegmentation Micro Focus Extensions for Double-Byte Character SupportNext"