Double-Byte Character Handling

Asian character sets contain large numbers of ideographic characters that represent an entire or partial word or concept. They may also contain interspersed phonetic characters. They may therefore consist of tens of thousands of characters. Because one 8-bit byte can hold only 256 unique codes, these languages require at least two bytes to represent each character, in order to accommodate the full range.

Most double-byte characters occupy two full character screen positions (each byte corresponds to one screen position). Such data may be entered into and displayed from USAGE DISPLAY data items. Most COBOL applications can therefore accept and store double-byte data without modification.

Problems can arise when double-byte data is displayed on the screen. For example, during an ACCEPT, one byte of a double-byte character may be deleted or overwritten. When a window is displayed, the edge of the window might cover one byte of a double-byte character. In these circumstances, the pairing of bytes can change, and the resulting codes may represent entirely different characters. On most machines this confuses the operating system's display driver. To overcome these potential problems, the runtime system must follow two rules:

  1. Always display both bytes of a double-byte character together (never display only part of a double-byte character).
  2. Always overwrite, or change the attributes of, both bytes of a double-byte character together (never overwrite, or change the attributes of, only part of a double-byte character).

These rules must be obeyed when an ACCEPT handles cursor movement, cursor placement, text selection, delete, backspace, and character overtyping.

The rules must also be followed when the edges of windows are displayed, to avoid covering parts of double-byte characters.

To implement these rules, the runtime system needs to know which of several double-byte character encoding schemes is being used. It gets this information from the value of the configuration variable CODE-SYSTEM. See Appendix H for a detailed discussion of this variable.