Directory Handling

Callable File Handler (ExtFH)

Chapter 12: Miscellaneous Facilities

This chapter covers a number of file handling topics. These include:

Compiler directives
Run-time switches
Including the Callable File Handler in your applications
Operating system considerations
Multiple-reel files
Buffering
Sparse and duplicate keys in indexed files
Data and key compression

12.1 Compiler Directives

If your program handles data files, there are several Compiler directives that you should be aware of. Used properly, they can simplify your code, set up default language behavior and give you access to important language extensions, such as ANSI'85.

These directives include:

ANS85
ASSIGN
CALLFH
DATACOMPRESS
IDXFORMAT
FILETYPE
KEYCOMPRESS
MS
OPTIONAL-FILE
RECMODE
RM
SEQUENTIAL

12.1.1 ANS85

The ANS85 Compiler directive enables ANSI'85 COBOL language extensions and gives you ANSI'85 file status codes.

If you specify the NOANS85 directive, you do not get ANSI'85 file status codes. Additionally, you cannot use ANSI'85 reserved words.

If you specify ANS85"SYNTAX", you can use ANSI'85 syntax but your program will return ANSI'74 file status codes.

For example, if you compile a program with NOANS85, the file status code for "File not found" is 9/013 (first status byte = ASCII 9, second status byte = decimal 13), the Micro Focus run-time system error code for this problem. If you omit the NOANS85 directive, the file status for "File not found" is 3/5, (first status byte = ASCII 3, second status byte = ASCII 5), the ANSI'85 file status code for this problem.

12.1.2 ASSIGN

The ASSIGN Compiler directive can be set to either EXTERNAL or DYNAMIC as follows:

ASSIGN"DYNAMIC"	(the default)
ASSIGN"EXTERNAL"

The ASSIGN Compiler directive specifies whether the default file assignment is dynamic or external.

If ASSIGN"EXTERNAL" is specified as a directive, and the SELECT statement is coded like this:

     select fd-name
         assign to var1

then COBOL treats the SELECT as if it were coded like this:

     select fd-name
         assign to external var1

12.1.3 CALLFH

On 32-bit COBOL systems, the Callable File Handler is used for all file handling and so you do not need to use the CALLFH Compiler directive in order to specify use of the Callable File Handler.

On 16-bit COBOL systems, the Callable File Handler is used, by default, for all indexed file handling. However, if you want to use the Callable File Handler to process files other than indexed files, or if you want to use Fileshare, you must compile your program using the CALLFH Compiler directive.

12.1.4 DATACOMPRESS and KEYCOMPRESS

If you have files with large areas that are unused or reserved for future use, you can save disk space by compressing the files.

To compress data files, two directives are available with COBOL: DATACOMPRESS and KEYCOMPRESS . These directives must be specified when you create the file.

For indexed files and record sequential files, you can suppress repeated characters through use of the DATACOMPRESS directive.

Note: Compressed files are variable-length record format. Specifying data compression on relative files does not save space because the maximum record length is always written out.

For indexed files, you can compress keys through use of the KEYCOMPRESS directive.

These directives can be specified through a $SET statement, enabling compression on selected files.

For more information on DATACOMPRESS and KEYCOMPRESS, see the section Data and Key Compression.

12.1.5 IDXFORMAT

The IDXFORMAT Compiler directive enables you to specify the type of indexed file you want to create. For example, you can create C-ISAM files using IDXFORMAT"1" and BTRIEVE files using IDXFORMAT "5" or IDXFORMAT "6".

For handling files with many alternate keys, use IDXFORMAT"4". This improves performance for delete and rewrite operations and enables a larger number of duplicate keys.

12.1.6 MS

DOS, Windows and OS/2:
The MS Compiler directive offers Microsoft COBOL compatibility in several areas. For example, it enables Microsoft file status codes to be generated. You can get Microsoft COBOL Version 1.0 support by specifying MS"1" and Version 2.0 support by specifying MS"2". For more information, see the chapter Microsoft COBOL V1.0 and V2.0 Syntax Support in your Language Reference - Additional Topics.

12.1.7 OPTIONAL-FILE

The OPTIONAL-FILE Compiler directive is specified as either:

OPTIONAL-FILE	(the default)
NOOPTIONAL-FILE

With the OPTIONAL-FILE directive, SELECT statements for files opened I-O or EXTEND are treated as if OPTIONAL were coded in the SELECT clause. This means that if a non-existent file is opened I-O or EXTEND:

With NOOPTIONAL-FILE, you get an error message.
With OPTIONAL-FILE, the file is created, as if it were opened for OUTPUT, closed, and then opened I-O or EXTEND.

12.1.8 RECMODE

The RECMODE Compiler directive specifies whether the default RECORDING MODE for a file is fixed or variable format as follows:

RECMODE"F"	(the default)
RECMODE"V"

If RECMODE"V" is specified, you can omit the RECORDING MODE IS VARIABLE clause in your SELECT statement; however, you must specify RECORDING MODE IS FIXED, if that is what you want.

12.1.9 RM

The RM Compiler directive offers Ryan-McFarland COBOL compatibility in several areas. For example, it enables Ryan-McFarland file status codes to be generated. For more information, see your Compatibility Guide and the chapter Ryan-McFarland COBOL V2.0 Syntax Support in your Language Reference - Additional Topics.

12.1.10 SEQUENTIAL

The SEQUENTIAL Compiler directive determines whether, by default, files with a ORGANIZATION IS SEQUENTIAL clause in the SELECT statement are record sequential or line sequential.

The directive can be set to RECORD, LINE, ANSI or ADVANCING as follows:

	SEQUENTIAL"RECORD"	(the default)
	SEQUENTIAL"LINE"SEQUENTIAL"ANSI"
	SEQUENTIAL"ADVANCING"

This directive can be useful if you are converting from other COBOL dialects where ORGANIZATION SEQUENTIAL is equivalent to this COBOL system's ORGANIZATION LINE SEQUENTIAL.

If you compile with SEQUENTIAL"LINE", and code your SELECT statement like this:

     select fd-name
         assign to ...
         organization is sequential

then the file is a line sequential file, rather than a record sequential file, which is otherwise the default.

12.2 Run-time Switches

There are four run-time switches which affect file handling:

Switch	Description
G *(16-bit only)*	File buffer switch: specifies the file buffer size on the 16-bit COBOL system but only for files that are not handled by the Callable File Handler.
L2	Record terminator switch: specifies the record terminator in line sequential files.
N	Null switch: enables null characters to be inserted into line sequential files.
T	Tab switch: enables tab characters to be inserted into line sequential files.

12.2.1 File Buffer Switch (G)

COBOL uses file buffers to hold data during input/output operations. Whenever a file is opened, a file buffer is created by COBOL.

The buffering rules differ for non-shared and shared files.

For non-shared files, files opened for sequential access use 4K buffers; files opened for random access use 512-byte buffers.

The file buffer switch can be used if lack of memory is a problem, or if you want file updates to occur more frequently. This switch defaults to off (-G), in which case all files use 4K buffers. When the switch is set on (+G), all files (including those opened for sequential access) use 512-byte buffers.

When you use the smaller buffers, file updates occur more frequently, because the buffers fill up faster. Naturally, this can affect performance.

For shared files, all files use 512 byte buffers, regardless of the setting of the G run-time switch.

12.2.2 Record Terminator Switch (L2)

For line sequential file, by default, the run-time system treats a line-feed character (x"0A") as a record terminator on all environments. However, most editors on DOS, Windows, and OS/2 treat a carriage-return character followed by a line-feed character (x"0D0A") as the record terminator.

In versions of this COBOL system before V3.1, x"0A" was treated as the record terminator for UNIX, but x"0D" was treated as the terminator for DOS, Windows, and OS/2.

If you are creating cross-platform applications, or require compatibility with programs created using an earlier version of COBOL, you should use the record terminator switch to specify the character to be used as the record terminator for your environment.

When -L2 (the default) is used, x"0A" is treated as the record terminator.

When +L2 is used, x"0D0A" is treated as the record terminator.

12.2.3 Null Switch (N)

The null switch enables insertion of a null character (x"00") before data characters whose value is less than x"20" in line sequential files. If you want to include non-ASCII data in a file, you must enable null insertion.

By default, the null switch is set on (+N).

If a file is created by an application running with -N, the file must be read back by an application running the same way; the same is true for files created by applications running with +N. For example, if you have a program which is sending control characters directly to a printer (for example, to switch the printer into 132 character mode) and the printer is not recognizing the control characters, you need the -N switch.

12.2.4 Tab Switch (T)

The tab switch is set off (-T) by default. The tab switch compresses extra spaces to tab characters (x"09"), for line sequential files. This saves space in the file.

If you write a file with +T set, you must use +T when reading the file; if you write a file with -T set, you must use -T when reading the file.

12.3 Including the Callable File Handler Module

If your application uses the Callable File Handler, you need to include the Callable File Handler module when you create your executable file(s).

16-bit:
If you are going to use the run-time environment on the 16-bit COBOL system, and therefore run your program in .int or .gnt format, you must put the file extfh.gnt into one of your application library (.lbr) files.

16-bit:
If you are working with .obj files in the 16-bit COBOL system, your link step looks like this:

link myprog+extfh+...any other submodules needed;

32-bit:
On 32-bit systems, if your application uses the Callable File Handler, it is automatically included when you build your application:

UNIX:

cob -x myprog

32-bit Windows and OS/2:

cbllink myprog

12.4 Operating System Considerations

There are many operating system specific considerations which affect file handling. Some common ones are presented in this section, including some tips on avoiding and resolving memory problems related to file handling.

12.4.1 Power Failures

If a power outage or a system reboot occurs while an application is executing, the integrity of files that were being processed by the application when the failure occurred cannot be guaranteed.

12.4.2 File Handles

A file handle is an operating system mechanism for controlling the way a file is used by the system. Every file that your application has open requires at least one file handle. On UNIX this is called a file descriptor.

DOS:
Under DOS, there is a FILES parameter in config.sys. This parameter specifies the maximum number of files that can be open at any one time during your DOS session. The maximum value for FILES is 255, although different releases of DOS can allow fewer file handles. DOS itself uses five file handles.

OS/2:
Under OS/2, the FILES parameter in config.sys specifies the maximum number of files that can be open simultaneously.

UNIX:
Under UNIX, the maximum number of files that can be open at any one time, excluding the standard input, output and error files, is dependent on the configuration of your UNIX system. See your Release Notes for configuration details for your system. The open mode of a file can influence the maximum number of files that can be open at any one time. Files opened for INPUT are limited to the maximum number of files per process which has been configured for your UNIX system. Files with EXCLUSIVE access (for example, files opened for OUTPUT) acquire file locks and are thus limited to the maximum number of file locks per process which has been configured for your UNIX system. These two limits are not necessarily the same.

When you are running COBOL, the FILES parameter is set by default to 100 on DOS and Windows, and 255 on OS/2. Some users require more file handles, and some less.

16-bit:
On 16-bit systems, you can alter the maximum number of files allowed to be open at one time using the /F run-time switch. See your Object COBOL User Guide for details.

32-bit:
On 32-bit COBOL systems, you can specify the maximum number of open files with the tunable max_file_handles. See your Object COBOL User Guide for details.

Some of the factors that determine how many file handles are needed are described below.

12.4.2.1 File Handles for Indexed Files

Because indexed files actually comprise two files, the data file and the index file, two file handles are required for each open indexed file.

12.4.2.2 File Handles During Overlays

Whenever an overlay is loaded, a file handle is used. Overlays are produced whenever you specify them by dividing your program into sections numbered 50 and above, and can be produced when the Procedure Division of a program is greater than 64K.

Statements that can cause overlays to be loaded include CALL, GOTO and PERFORM.

Note: When overlays are loaded from .lbr files (application library files), the file handle used is the one for the .lbr file.

12.4.2.3 File Handles During Sorts

When a COBOL SORT is performed, a work file is created. A file handle is used for this work file.

12.4.2.4 File Handles During Animation

During animation, a certain number of Animator control files are opened. Each of these requires a file handle. This means that your application can run out of file handles during animation, but not when running standalone.

12.4.3 Network File Handling Limits

Some networks have limits built in that may be more restrictive than those imposed by your operating system. If you are running on a network, consult your network documentation to find out the maximum number of files that you can have open at one time.

12.4.4 Memory Requirements

16-bit:
Under the 16-bit run-time system with DOS, Windows and OS/2, the memory required for open files is calculated as the buffer size + 96 bytes for the header record. The buffer size is, by default, 4096 bytes for sequential access files and 512 bytes for all other files. If the G run-time switch is turned on, then all files use 512-byte buffers. Screen input and output and direct output to a printer are excluded from this calculation.

UNIX:
On UNIX, by default, a maximum of 3 Mbytes of memory can be used by a SORT or MERGE, although a minimum of 55% of the original calculated requirement is generally sufficient. If this value is considered too high for your environment, the environment variable COBSW=-s can be used to restrict the memory used. If, however, you have more than 3 Mbytes of memory to deal with larger amounts of data, you can set COBSW=-s to enable SORT/MERGE to use this extra memory. In general, performance is better if the amount of memory that SORT/MERGE has at its disposal is greater than the size of the data to be sorted or merged.

12.4.4.1 DOS Memory Management and Fragmentation

DOS:
If you are not using the Run-time Environment, and you are not running under OS/2 or Windows, you need to be aware of the possibility of memory fragmentation.

Fragmentation can occur as follows. In a called subroutine that you eventually cancel, you open and close some files. These files might require a new 32K buffer space to be allocated, non-contiguous (somewhere else in memory) with the previous 32K buffer space. When you then call another subroutine, there may not be enough memory between the two file buffer spaces to load the subroutine, and so you are wasting memory.

To avoid this problem, ensure that all of your file buffer spaces are contiguous. To do this, open dummy files in your main program with corresponding FD and SELECT clauses to all the files that are required by your application. Then close the files. In this way, all true OPENs that occur in your subroutines use the file buffer space you have already allocated, and do not, therefore, prevent you from loading programs.

12.4.5 File Size

UNIX:
On UNIX, the maximum size of any file you can create is limited by the environment variable ulimit. You may find that the default limit, as supplied with your UNIX system is rather small but this value can be increased by a superuser.

12.4.6 Opening Files on UNIX

UNIX:
On UNIX systems, the COBOL OPEN OUTPUT statement requires two calls to the operating system; the first to open the file and the second to obtain an exclusive lock. This means that a small time lapse exists between these two calls and so it is possible for a second user to access the file between the time that the file is created and the time that it is locked exclusively. If this happens, the second user will find the file empty and an attempt to read the file will result in an "at end" error. The user performing the OPEN OUTPUT will receive a "file locked" error because the exclusive lock call fails.

12.5 Multiple-Reel Files

You can specify sequential files as multiple reel (or multiple unit) files. This means that a sequential file can be held on more than one:

Removable disk
Cartridge tape (UNIX)
Tape reel (UNIX)
Disk partition (UNIX)

You must specify the file as a multiple reel file in the SELECT clause of the Environment Division.

You cannot specify a sequential file as multiple reel if it has variable-length records, since the file header record (see below) stores only one record length.

Whenever you specify a sequential file as a multiple reel file, you are prompted to load the appropriate reel of the file. This applies also to the first reel of the file, even though it may already be loaded. The prompt is:

PLEASE LOAD VOLUME nnnn OF FILE filename FOR access
ENTER NEW DEVICE (IF REQUIRED) AND <CR> WHEN READY

where the parameters are:

`nnnn`	The four-digit reel number of the reel to be loaded within the range 0001 to 9999.
`filename`	The filename, as specified in the SELECT clause in the source program.
`access`	INPUT, OUTPUT or I/O as specified in the source program.

Ensure that the relevant disk or reel is loaded (media for output must already be formatted), and enter:

DOS, Windows and OS/2:
The single character drive identifier. If you do not specify a drive, this COBOL system assumes the default drive or the device identifier in the SELECT clause of your program.
UNIX:
The filename.

The system only accepts input in response to this prompt. The system clears the input buffer each time this prompt is displayed so you cannot type ahead. If you load the wrong volume of a file, or if the header information is in some way corrupt, an error is returned.

UNIX:
On UNIX systems, when you have entered the relevant parameters for the first prompt, another prompt is displayed as follows:

PLEASE ENTER CAPACITY OF DEVICE IN 1024 BYTE BLOCKS

Enter the capacity of the device, in 1024-byte blocks.

On all operating systems, if you decide not to continue, you must terminate using the appropriate keys (for example Ctrl+Break on DOS, Windows and OS/2). This ensures that all the files are closed and that all the information is saved, as though you had executed a STOP RUN statement.

The prompt to load a reel is displayed whenever:

A multiple reel file is opened
A CLOSE REEL statement is executed
The reel becomes full while writing to a multiple reel file (this is a forced reel-swap on WRITE)
"End of reel" is true for a multiple reel file that is opened for INPUT (or I/O on READ). This is true provided that a continuation reel was created when the file was written.

Note: Although you can specify REWRITE operations on a file opened for I-O, we do not recommend that you do so. If the record you are rewriting is at the end of a reel, the preceding READ statement has forced a reel-swap, so the rewrite fails.

12.5.1 Multiple Reel File Header Record

Multiple reel files have a block of header information that is 256 bytes long. This header occupies the first 256 bytes of each reel and contains information that describes the reel. It also contains 44 bytes which are reserved. Under DOS, Windows and OS/2, you can use this reserved space for your own internal identifier but you cannot alter this reserved space under UNIX.

DOS, Windows and OS/2:
On DOS, Windows and OS/2 systems, the following routine moves a string to the reserved area:

call x"A8" using your-label

where the parameter is:

your-label A PIC X(44) field containing the information to be put into the header.

Only ASCII printable characters are allowed in this area. Once this routine has been used, each subsequent OPEN OUTPUT on a multiple reel file has your string in its header.

Note: Call x"A8" is not supported for files accessed by the Callable File Handler. This includes indexed and external files as well as all files accessed from a program compiled with the CALLFH directive.

A multiple reel file header has the following structure:

Bytes

Content

0-49

Multiple reel header start identification.

50-69

Filename. This is the name of the file as specified in the SELECT clause.

70-75

Date of file creation in the form yymmdd (year, month, day in ASCII digits). If your system does not return the date, this part of the header contains ASCII zeros.

76-83

Time of file creation in the form hhmmsscc (hours, minutes, seconds, hundredths of a second in ASCII digits). If your system does not return the time, this part of the header contains ASCII zeros.

84-127

Reserved area for your own use (see above).

128-131

Reel number. This is a four-digit ASCII value showing the reel number in the range 0001 through 9999.

132

Continuation flags. A one-byte value that shows how the reel ends. The value is ASCII "Y", "A" or "N" as follows:

Y	This reel is followed by a continuation reel.A CLOSE REEL statement was used to change the reels.
A	This reel is followed by a continuation reel.The reels were changed automatically when this reel became full.
N	This reel has no continuation. It is the last reel of the file.

133

Reserved. Currently contains the ASCII value "N".

134-145

Reel length. A 12-digit ASCII value which indicates how many bytes of information are on this reel. This part of the header contains zeros if your system cannot determine the reel size.

146-151

R e cord size. This is a six-digit ASCII value that shows the record length of records in this file.

152-157

Block size. This is a six-digit ASCII value that has the same value as the record size area of the header.

158-239

Reserved area containing ASCII spaces.

240-255

Multiple reel header end identification.

12.6 File Buffering

File buffering is where records are written to a buffer (a block of memory) until the block is full, at which time the block is written to disk. This method reduces the number of accesses to disk, consequently speeding up the program.

Similarly when reading records from disk, a block is read from the file into a memory buffer and records extracted from the buffer.

When the file is closed, any data that has not already been written to disk is written. The COBOL system then asks the operating system to close the file.

12.6.1 File Buffering on DOS, Windows and OS/2

All sequentially accessed data files written by this COBOL system can be buffered. The Callable File Handler buffers indexed files using pre-defined buffers. The size of the buffers can be changed using the environment variables EXTFHBUF and IDXDATBUF.

EXTFHBUF specifies the size of the buffer to be used for the index .idx file. It has a minimum size of 4096 bytes, a default size of 16384 and a maximum size of 65535 bytes. It can be changed using the operating system command:

set extfhbuf=buffer-size

where the parameter is:

buffer-size The size of the buffer to use, in bytes, in multiples of 4096.

The environment variable IDXDATBUF specifies the size of the buffer used for the data .dat file. The default is zero, in which case records are not buffered. Data buffering generally improves the speed of sequential accesses to an indexed file only if the data was written in key order. You can get a file into key order using the reorganization facility of Rebuild. See the chapter Maintaining Files for further details. Data buffering is enabled by the operating system command:

set idxdatbuf=buffer-size

where the parameter is:

buffer-size The size of the buffer to use, in bytes, in multiples of 4096.

12.6.2 File Buffering on UNIX

On UNIX, variable length sequential files are buffered by default. Fixed length sequential files are buffered when the environment variable COBEXTFHBUF is set.

COBEXTFHBUF controls the size of the buffer to be used as follows:

COBEXTFHBUF=buffer-size
export COBEXTFHBUF

where the parameter is:

buffer-size The buffer size in bytes.

It is not possible to change the buffer sizes used for C-ISAM files.

12.6.3 Index Caching

The Callable File Handler uses a global buffer to improve speed when reading/writing to the index portion of an indexed file. The size of this buffer is 16K by default but it can be changed using the EXTFHBUF environment variable. All indexed files share this same buffer.

To increase performance, particularly when creating/updating large indexed files, or files with a large number of alternate keys, each file can have its own index buffers allocated using either the EXTFHBUF option as part of filename mapping, or by setting the appropriate bytes in the FCD. See the chapter Callable File Handler for information about the FCD. A file can have one buffer, that all keys share, up to a maximum equal to the number of keys for the file.

For example, a file (test.dat) has eight keys:

To specify one 16K buffer for the file, either set up the file mapping parameter:
```
test.dat=test.dat (EXTFHBUF 16 1)
```
or move 16 to offset 96 in the FCD and move 1 to offset 97.
To specify that each key has its own buffer, either set up the file mapping parameter:
```
test.dat=test.dat (EXTFHBUF 16 8)
```
or move 16 to offset 96 in the FCD and move 8 to offset 97.

If fewer buffers are allocated than the number of keys for the file, keys must share buffers.

For example, four buffers for a file with eight keys:

keys 1 and 5 share buffer 1
keys 2 and 6 share buffer 2
keys 3 and 7 share buffer 3
keys 4 and 8 share buffer 4.

This buffering requires extra memory to be allocated and is equal to number of buffers * buffer size.

12.7 Indexed File Keys

The following sections describe sparse keys and how to handle files with a large number of duplicate keys.

12.7.1 Sparse Keys

A sparse key is a key for which no index entry is stored for a given key value. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the key part of the record contains only space characters.

Normally, when the Callable File Handler stores an indexed record, the key value is stored in the index file for each key defined. If an alternate key is defined as a sparse key and the key value in the record is the specified sparse value, the key is not stored. However, enough information is stored for the record to be read via the normal primary key path.

Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your disk savings.

Only alternate keys that allow duplicates can be sparse.

12.7.1.1 Using Sparse Keys

Sparse keys are defined using the SUPPRESS clause in the SELECT...ALTERNATE KEYS statement (see your Language Reference for details).

UNIX:
On UNIX systems, sparse keys are not supported for C-ISAM files.

Example

 input-output section.
 file-control.
     select out-file
         assign to "outfile"
         organization is indexed
         access mode is dynamic
         record key is out-key
         alternate record key is alt-key
             with duplicates
             suppress when all "A"
*
* Hence, if we write a record whose alternate key has value all
* "A", the actual key value is not stored in the index file
*
     file status is file-status.

 data division.
 file section.
 fd out-file.
 01 out-rec.
     03 out-key               pic 9(10).
     03 alt-key               pic x(20).
 working-storage section.
 01 file-status               pic xx.

 procedure division.
     open output out-file
     if file-status not = "00"
*         < code to abort >
     end-if
     perform varying out-key from 1 by 1
         until out-key > 10
         move all "A" to alt-key
         write out-rec
          invalid key
*             < code to abort >
     end-write
     end-perform
     close out-file
     stop run.

12.7.2 Duplicate Keys

You can define keys in indexed files to allow duplicate values. However, we do not recommend that you allow duplicates on primary keys. This is because, with duplicates allowed, you cannot uniquely identify records in a file. See the READ, REWRITE and DELETE statements in your Language Reference.

To enable duplicate keys, you specify the phrase WITH DUPLICATES in the ALTERNATE RECORD section of the SELECT statement.

When you use duplicate keys, you should be aware that a maximum number of 65535 duplicate keys is allowed for every individual key in a standard file. Each time you specify a duplicate key, 1 is added to its occurrence number. However, because the occurrence number is used to ensure that duplicate key records are read in the order in which they were created, and any occurrence number whose record you have deleted cannot be reused, the duplicate key maximum can be reached.

To overcome this, a different file type is available: IDXFORMAT"4". You invoke this file format when you specify the IDXFORMAT"4" Compiler directive before the SELECT statement for individual files, or for all files in the program. IDXFORMAT"4" format files allow a maximum of 4,294,967,297 duplicate keys. Each record in the data file is followed by a system record holding the number of duplicate keys for that record. This makes a REWRITE or DELETE operation on a record with many duplicates much faster. This causes the data record of such files to be larger than those of the default files.

12.8 Data and Key Compression

Records and keys in files created using the Callable File Handler can be compressed so they take up less physical disk space. You can enable data compression by using the Callable File Handler to call compression routines from your program.

12.8.1 Data Compression

Data compression enables you to compress the data in a sequential or indexed file. There are two compression mechanisms provided with this COBOL system, run-length encoding (type 1), and extended run-length encoding (type 3).

When a file is defined with run-length encoding, any string of repeating characters is stored as a single character with a repetition count.

You enable data compression using the DATACOMPRESS Compiler directive.

16-bit:
On the 16-bit COBOL system, when compressing data in a sequential file, you must specify the CALLFH Compiler directive.

Specifying data compression for a fixed structure sequential file changes it into a variable structure sequential file. See the chapter File Structures for further information.

The compression used by a file is determined by the last processed DATACOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:

$SET DATACOMPRESS

immediately before its SELECT statement. You must not forget to turn it off with a $SET NODATACOMPRESS before any other files are processed.

Note: We recommend that you do not use the REWRITE statement on compressed sequential files. This is because a REWRITE operation will only succeed if the length of the compressed new record is the same as the length of the compressed old record.

12.8.2 Key Compression

Key compression is a technique that can be applied to the keys of an indexed file. There are three types of compression available:

Compression of trailing spaces
Compression of identical leading characters
Compression of duplicate alternate key values.

Any combination of these can be used on any key, though the compression of duplicates is only appropriate to alternate keys with duplicates enabled.

Key compression is specified using the KEYCOMPRESS Compiler directive.

The compression used by a file is determined by the last processed KEYCOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:

$SET KEYCOMPRESS

immediately before its SELECT statement. You must not forget to turn it off with a $SET NOKEYCOMPRESS before any other files are processed.

For details on the KEYCOMPRESS Compiler directive, see your Object COBOL User Guide.

12.8.2.1 Compression of Trailing Spaces

When a key is defined with compression of trailing spaces, trailing spaces in a key value are not stored in the file. However, information is stored so that the key can be correctly located.

For example, assume you have a prime or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being spaces. Without compression, all 30 characters of the key are stored. With compression of trailing spaces, the key only occupies 11 bytes in the index file (10 bytes for the characters of the key and 1 byte as a count of the trailing spaces).

12.8.2.2 Compression of Leading Characters

When a key is defined with compression of leading characters, all leading characters that match leading characters in the preceding key are not stored in the index file. However, information is stored to enable the key to be correctly reconstructed.

For example, assume that records are written with the following key values in a key defined with compression of leading characters:

AXYZBBB BBCDEFG BBCXYZA BBCXYEF BEFGHIJ CABCDEF

The keys actually stored in the index file are:

AXYZBBB BBCDEFG XYZA    EF      EFGHIJ  CABCDEF

12.8.2.3 Compression of Duplicate Keys

When an alternate key is defined with compression of duplicates, only the first duplicate key is contained in the file. The rest are not stored, but information is stored to enable correct recreation of the keys.

For example, suppose you write a record with an alternate key value "ABC". If you have enabled compression of duplicate keys, and you write another record with the same key value, the file handler does not physically store the duplicate key value in the index file. However, the record is still available along the alternate key path.

12.8.3 Example of Using Data and Key Compression

In the following program, data compression is specified for transfile but not for masterfile. For key compression, suppression of trailing spaces and of leading characters that are the same as in the previous key is specified for keys t-rec-key and m-rec-key. Suppression of repetition of duplicate keys is also turned on for m-alt-key-1 and m-alt-key-2.

$set callfh"extfh"
$set datacompress"1" 
$set keycompress"6" 
     select transfile 
         assign to ... 
         key is t-rec-key. 
$set nokeycompress
$set nodatacompress

     select masterfile 
         assign to ... 
         organization is indexed 
$set keycompress"6"
         record key is m-rec-key

$set keycompress"7"
         alternate key is m-alt-key-1 with duplicates
         alternate key is m-alt-key-2. 
$set nokeycompress

12.8.4 Compression Routines

The routines that the Callable File Handler uses to compress data are stand-alone modules. This means that you can use them in your own applications, or alternatively make the Callable File Handler use your own data compression routine.

There can be up to 127 Micro Focus compression routines, and up to 127 user-supplied compression routines.

12.8.4.1 Micro Focus Compression Routines

Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set the byte at offset 78 in the FCD to a value between 001 and 127.

12.8.4.1.1 Micro Focus Compression Routine CBLDC001

The compression routine, CBLDC001, uses a form of run length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.

Note: This routine is not effective for use with files that contain significant occurrences of double-byte characters, including double-byte spaces, as these are not compressed.

CBLDC001 puts special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).

In the compressed file, bytes have the following meanings (hex values shown):

20-7F	(most printable characters) normal ASCII meaning.
80-9F	1-32 spaces respectively.
A0-BF	1-32 binary zeros respectively.
C0-DF	1-32 character zeros respectively.
E0-FF	1-32 occurrences of the character following.
00-1F	1-32 occurrences of the character following, and that it should be interpreted literally, not as a compression code. This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.)

12.8.4.1.2 Micro Focus Compression Routine CBLDC003

Like CBLDC001, this routine uses run length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.

The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:

bit 15	Unset - single character
bit 14	Set - compressed sequenceUnset - uncompressed sequence
bit 0-13	Compressed character(s) or count of uncompressed characters

The length of the character string depends on the header bits:

bit 14 and 15 set	two repeating characters.
Only bit 14 is set	one repeating character
Otherwise	between 1 and 63 uncompressed characters.

12.8.4.2 Calling a Micro Focus Compression Routine

For data file compression the Callable File Handler calls the compression routine that you specify in the DATACOMPRESS Compiler directive.

To call a Micro Focus data compression routine use the syntax:

COBOL:

call "CBLDCnnn" using input-buffer,
                        input-buffer-size,
                        output-buffer,
                        output-buffer-size,
                        compression-type

cbldcnnn(input_buffer, &input_buffer_size,  output_buffer,  &output_buffer_size,
      &compression-type);

where the parameters are:

`nnn`	A data compression routine in the range 001 to 127.
`input_buffer`	A PIC X(size) data item and is the data to compress or decompress; maximum size is 65535.
`input_buffer_size`	A two byte (int in C, PIC XX COMP-5 in COBOL) data item and outputs the length of data in the input-buffer.
`output_buffer`	A PIC X(size) data item and is the buffer to contain the resulting data.
`output_buffer_size`	A two byte (int in C, PIC XX COMP-5 in COBOL) data item. On entry to the routine this data item must contain the size of the output buffer available; on exit this contains the length of the data in the buffer.
`compression-type`	A one byte (char in C, PIC X COMP-X in COBOL) data item. This specifies if the input data is to be compressed or decompressed: 0 - compress 1 - decompress.

The RETURN-CODE special register indicates whether the operation succeeded or not. Compression or decompression fails only if the output buffer is too small to accept the results.

0 indicates success
1 indicates failure.

12.8.4.3 User-Supplied Compression Routines

User-supplied compression routines must be stored in modules called USRDCnnn, where nnn is within the range 128 to 255.

To call a user-supplied routine, use the same syntax as for calling a Micro Focus routine, but use the filename USRDCnnn instead of CBLDCnnn where nnn must be a value in the range 128 through 255.

To make your compression routine available to your system, you must dynamically call or link the routine with your application.

UNIX:
Under UNIX, you dynamically call a data compression routine by compiling the routine and then rebuilding your run-time system using the command:

sh $COBDIR/src/rts/mkrts -d routine-name

where:

routine-name is the name of your compression routine.

You link the relevant data compression routine by creating an object module from the source of your compression routine and linking this to the run-time system using the command:

sh/$COBDIR/src/rts/mkrts routine-name.o

where:

routine-name.o is the name of the object module produced from the source of your compression routine.

You can map calls to data compression routines in programs from previous UNIX COBOL systems to the new calls using the cob option:

-m CBL_DATA_COMPRESS_nnn=CBLDCnnn

Notes:

Your compression routines must not make any calls to the Callable File Handler, as this would result in a loop. If you need file access, use byte-stream file I/O.
Once you have enabled data compression for a file, you must always subsequently specify the same type of compression for that file. If you do not, you receive a run-time system error when you open the file.
Data compression has no effect on files other than those in indexed or record sequential format and is ignored at compile time for files that do not support it.
If you want to read a fixed-length sequential file that has had its data compressed, you must specify that the file is compressed in your program. You do this by specifying the DATACOMPRESS Compiler directive.

Directory Handling

Callable File Handler (ExtFH)