Segmentation | Micro Focus Extensions for Double-Byte Character Support |
Many of the world's languages use sets of characters that run into the thousands. Most computers use 8-bit bytes, and assign a different 8-bit code to represent each character; this scheme can represent no more than 256 different characters.
Ideally a COBOL programmer should not need to be aware of the internal code used to represent characters. However, in practice some features of the internal code can affect the source programmer, and this limitation to 256 different characters is one of the most restricting of these.
For this reason the Double-Byte Character Set (DBCS) is provided. In this scheme each character is represented by a 16-bit code, each character occupying a pair of adjacent bytes. This scheme can represent thousands of different characters.
The assignment of DBCS character codes to characters varies from country to country.
The 8-bit code used by your COBOL system is the American Standard Code for Information Interchange (ASCII). In this chapter this will be referred to as the Single-Byte Character Set (SBCS).
Double-Byte Character Set support is sensitive to the DBCS Compiler directive.
See also the chapter Micro Focus Extensions for Double-Byte Character Support, primarily for Japanese language support.
The DBCS Compiler directive makes your COBOL compiler recognize two data categories in which data is stored in DBCS. It does not prevent the use of other data categories; thus you can still use those data categories in which data is stored in SBCS.
Provided you have the necessary hardware support, DBCS data items used in input and output will be recognized and their data displayed and accepted correctly on such devices as screens, keyboard, printers, et cetera.
The character set that can be represented by SBCS is based on the Roman alphabet plus some other characters. In some countries the DBCS character codes also include codes for many of these characters.
On some hardware the character displayed is visibly different according to whether the character is stored in SBCS or DBCS; for example on some screens the DBCS code for a letter causes it to be printed larger than does its SBCS code.
Programs written to the NTT Multivendor Integration Architecture (MIA) Support are accepted by the COBOL compiler, using the DBCS and CURRENCY-SIGN"92" directives.
DBCS characters can be used in literals (since literals are data), in comments and comment-entries, and in user-defined words. Otherwise the DBCS directive does not change the range of characters that can be used in source programs - the program is still written using the COBOL character set (see the chapter Concepts of the COBOL Language in your Language Reference).
There are extensions to the PICTURE and USAGE clauses to define items that are to contain DBCS data. A new format of literal is required for DBCS data.
There are additional rules for various options, clauses and statements to define the behavior of DBCS data.
Except where otherwise stated, all the rules and features of COBOL remain applicable when DBCS is in use. The following sections give only the additional rules and formats pertaining to DBCS.
SBCS and DBCS characters can be mixed freely in comments and comment-entries.
Either SBCS or DBCS characters can be used in user-defined words for: Alphabet-name, Class-name, Condition-name, Data- name/Identifier, Record-name, File-name, Index-name, Mnemonic-name, Paragraph-name, Section-name, and Symbolic-character.
SBCS and DBCS characters can be freely mixed in user-defined words. Where a character exists in both the DBCS and SBCS character sets, its DBCS and SBCS representations will not be regarded as equivalent. See the section Roman Script in DBCS earlier in this chapter.
Spaces in data of class DBCS will be represented by the DBCS code for space. A space character represented by a 2-byte code is referred to as a DBCS space.
The values assigned to a DBCS space are sensitive to the DBCS and DBSPACE Compiler directives.
There is a class of data additional to the classes described in the chapter Concepts of the COBOL Language in your Language Reference: it is called DBCS. It includes two data categories: DBCS and DBCS edited.
A data item of class DBCS is described by using the USAGE DISPLAY-1 clause. An item with this clause can have only the characters "G" and "B" in its PICTURE character-string. A " G" represents a DBCS character position; "B" is an editing character, and indicates a position that will always have a DBCS space inserted in editing. An item whose PICTURE character-string is all "G"s is of category DBCS; an item whose PICTURE character-string contains both "G" s and "B"s is of category DBCS edited.
Note that each "G" or "B" represents one 2-byte character position. Except where otherwise stated, the length of the data item for all purposes is the number of "G" s and "B"s in its PICTURE character-string.
For reference modification, the leftmost-character-position and length specify the number of DBCS characters, not bytes.
Data items of class DBCS can be used wherever data items of class alphanumeric can be used, subject to rules and exceptions given in the appropriate places in this chapter.
DBCS characters can be included in data stored in data items of category alphanumeric. In such data, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.
On input and output both the SBCS and the DBCS codes will be recognized. The first byte of a DBCS code is never a valid SBCS code; hence the two can be used together without confusion. But in operations within the program the data will be treated as ordinary alphanumeric data. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.
The length of the data item for all purposes is its length in bytes.
There is a third type of literal in addition to the nonnumeric and numeric literals described in the chapter Concepts of the COBOL Language in your Language Reference, the DBCS literal.
A DBCS literal is a character-string delimited at both ends by quotation marks or apostrophes, with the beginning delimiter preceded by a "G". It can consist of any characters in the computer's DBCS character set. It can be up to 28 DBCS characters in length. It cannot be continued across lines.
Whether quotation marks or apostrophes are used, the presence of that delimiter within a DBCS literal can be represented by two contiguous occurrences. The presence of the character that is not serving as the delimiter is represented by a single occurrence. The value of a DBCS literal in the object program is the string of characters itself, except:
All DBCS literals can be used wherever nonnumeric literals can be used, subject to rules and exceptions given in the appropriate places in this chapter.
DBCS characters can be included in nonnumeric literals. A nonnumeric literal that includes DBCS characters is called a mixed literal. In such a literal, SBCS characters are represented by SBCS codes and DBCS characters by DBCS codes. Each space character is represented by the SBCS code for space.
On output both the SBCS and the DBCS codes will be recognized. The first byte of a DBCS code is never a valid SBCS code; hence the two can be used together without confusion. But in operations within the program the literal will be treated as an ordinary nonnumeric literal. It is the programmer's responsibility to ensure that the two halves of a DBCS code do not get separated.
A nonnumeric literal is of category alphanumeric, not DBCS, regardless of whether it includes DBCS characters.
A mixed literal cannot be continued across lines.
This restriction has been removed.
If a figurative constant is used where only a DBCS literal is allowed (according to the rules concerning classes and categories given in the appropriate places in this chapter), it is a DBCS literal. Each space in this literal is a DBCS space.
Only the figurative constant SPACE(S) can be a DBCS literal.
Another format of literal, equivalent to the DBCS literal, is used in COBOL/370 and the MIA COBOL specification.
N"ABC""DEF"
This restriction is removed.
This restriction is removed.
or "N".
This restriction is removed.
There are two additional categories of data that can be described with a PICTURE clause: DBCS and DBCS edited. Both these categories must be described as USAGE IS DISPLAY-1.
These categories need not be described as USAGE DISPLAY-1.
DISPLAY-1 is optional for PIC N, but not for PIC G items.
or "N".
Its PICTURE character string can contain any combination of the symbols "G" and "B".
The functions of these symbols are as follows:
G - | Each "G" represents a character position which can contain only a DBCS character or a DBCS space. |
B - | Each "B" represents a character position into which the DBCS space character will be inserted. |
N - |
Each "N" represents a character position which can contain only a DBCS character or a DBCS space. |
Note that each "G" or "B"
or "N"
represents a single double-byte character position.
The type of editing that can be performed on an item depends on the category to which the item belongs. Table 4-1 Function Names Support, (see the chapter Program Definition in your Language Reference) is extended with the following information:
Table 5-1 : Editing Types for Data Categories
Category |
Type of Editing |
---|---|
DBCS | None |
DBCS Edited | Simple insertion "B" only |
When used in an SBCS item, "B" (space) represents an SBCS space. When used in a DBCS item it represents a DBCS space.
This restriction is removed.
The General Format is extended by the addition of the following:
Note that therefore the USAGE IS DISPLAY-1 clause makes an item's class DBCS.
USAGE IS DISPLAY-1 is not required for items whose PICTURE clause contains
"G"s or
"N"s.
Whenever a PICTURE clause contains a "G" the associated item is considered to be of class DBCS.
Data items and literals of class DBCS can be used in a relation condition with any relational operator. Each operand must be either a group item or of class DBCS. No conversion, editing, or de-editing is done; no distinction is made between items of category DBCS and items of category DBCS edited.
The operation performed is a nonnumeric comparison. Since there is in general no collating sequence between the characters in a DBCS character set, the collating sequence used is based on the numeric values of the bit patterns representing the characters, interpreted as if they were binary numbers.
Note that if the DBCS character codes include codes for characters in the SBCS character set, there is no guarantee that this collating sequence will order them the same as in SBCS.
Where a character exists in both the DBCS and SBCS character sets, its DBCS and SBCS representations will not be regarded as equivalent. See the section Roman Script in DBCS in this chapter.
The PROGRAM COLLATING SEQUENCE clause has no effect on comparisons involving data items of class DBCS or DBCS literals.
If the operands are of unequal size, comparison proceeds as though the shorter operand were extended on the right by enough DBCS spaces to make them the same size.
The class tests for DBCS items are DBCS and KANJI.
DBCS and KANJI class tests are not supported.
All statements that involve moving data between items and/or literals of class DBCS obey the rules given for such moves under the General Rules for The MOVE Statement later in this chapter.
This restriction is removed.
This restriction is removed.
This restriction is removed.
This restriction is removed.
The General Format is extended by the options DBCS,
NATIONAL or NATIONAL-EDITED
as additional alternatives to ALPHABETIC, ALPHANUMERIC, et cetera.
NATIONAL or NATIONAL-EDITED option causes
data items of categories DBCS and DBCS-EDITED to be initialized.
Items of class DBCS, category alphanumeric, and category alphabetic can be mixed freely.
This rule is relaxed as follows. If a receiving field is of category DBCS edited, it is edited. For any other combination that includes a field of class DBCS, no conversion, editing, or de-editing is done.
This restriction is removed.
Copyright © 1999 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names
used herein are protected by international law.
Segmentation | Micro Focus Extensions for Double-Byte Character Support |