File Handling Library Routines

Byte-stream File Handling

Chapter 11: Miscellaneous Topics

This chapter covers a number of file handling topics. These include:

Compiler directives
Run-time switches
Including the Callable File Handler in your applications
Operating system considerations
Multiple-reel files
Buffering
Sparse and duplicate keys in indexed files
Data and key compression

11.1 Compiler Directives

If your program handles data files, there are several Compiler directives that you should be aware of. Used properly, they can simplify your code, set up default language behavior and give you access to important language extensions, such as ANSI'85.

These directives include:

ANS85
ASSIGN
CALLFH
DATACOMPRESS
IDXFORMAT
FILETYPE
KEYCOMPRESS
OPTIONAL-FILE
RECMODE
RM
SEQUENTIAL

11.1.1 ANS85

The ANS85 Compiler directive enables ANSI'85 COBOL language extensions and gives you ANSI'85 file status codes.

If you specify the NOANS85 directive, you do not get ANSI'85 file status codes. Additionally, you cannot use ANSI'85 reserved words.

If you specify ANS85"SYNTAX", you can use ANSI'85 syntax but your program will return ANSI'74 file status codes.

For example, if you compile a program with NOANS85, the file status code for "File not found" is 9/013 (first status byte = ASCII 9, second status byte = decimal 13), the Micro Focus run-time system error code for this problem. If you omit the NOANS85 directive, the file status for "File not found" is 3/5, (first status byte = ASCII 3, second status byte = ASCII 5), the ANSI'85 file status code for this problem.

11.1.2 ASSIGN

The ASSIGN Compiler directive can be set to either EXTERNAL or DYNAMIC as follows:

ASSIGN"DYNAMIC"	(the default)
ASSIGN"EXTERNAL"

The ASSIGN Compiler directive specifies whether the default file assignment is dynamic or external.

If ASSIGN"EXTERNAL" is specified as a directive, and the SELECT statement is coded like this:

     select fd-name
         assign to var1

then COBOL treats the SELECT as if it were coded like this:

     select fd-name
         assign to external var1

11.1.3 CALLFH

You can use the CALLFH directive to generate direct calls for all file I/O operations, using the Callable File Handler interface; this can be either the Micro Focus File Handler, Fileshare, or a file handler you have created. To use Fileshare, specify CALLFH(FHREDIR).

11.1.4 COBFSTATCONV

Use the COBFSTATCONV Compiler directive to specify that any status values retuned for a file should be converted using a status conversion routine before being returned to a COBOL program.

The name of the coversion status routine is specified at run time by the CONVERTSTATUS configuration option, or the COBFSTATCONV environment variable.

11.1.5 DATACOMPRESS and KEYCOMPRESS

If you have files with large areas that are unused or reserved for future use, you can save disk space by compressing the files.

To compress data files, two directives are available with COBOL: DATACOMPRESS and KEYCOMPRESS . These directives must be specified when you create the file.

For indexed files and record sequential files, you can suppress repeated characters through use of the DATACOMPRESS directive.

Note: Compressed files are variable-length record format. Specifying data compression on relative files does not save space because the maximum record length is always written out.

For indexed files, you can compress keys through use of the KEYCOMPRESS directive.

These directives can be specified through a $SET statement, enabling compression on selected files.

For more information on DATACOMPRESS and KEYCOMPRESS, see the section Data and Key Compression.

11.1.6 IDXFORMAT

The IDXFORMAT Compiler directive enables you to specify the type of indexed file you want to create. For example, you can create C-ISAM files using IDXFORMAT"1" and BTRIEVE files using IDXFORMAT "5" or IDXFORMAT "6".

For handling files with many alternate keys, use IDXFORMAT"4" or IDXFORMAT"8". This improves performance for delete and rewrite operations and enables a larger number of duplicate keys.

To create files that might grow beyond 1 gigabyte (2 gigabytes for non-shared files) use IDXFORMAT"8".

11.1.7 OPTIONAL-FILE

The OPTIONAL-FILE Compiler directive is specified as one of:

OPTIONAL-FILE	(the default)
NOOPTIONAL-FILE

With the OPTIONAL-FILE directive, SELECT statements for files opened I-O or EXTEND are treated as if OPTIONAL were coded in the SELECT clause. This means that if a non-existent file is opened I-O or EXTEND:

With NOOPTIONAL-FILE, you get an error message.
With OPTIONAL-FILE, the file is created as if it were opened for OUTPUT, closed, and then opened I-O or EXTEND.

11.1.8 RECMODE

The RECMODE Compiler directive specifies whether the default RECORDING MODE for a file is fixed or variable format as follows:

RECMODE"F"	(the default)
RECMODE"V"

If RECMODE"V" is specified, you can omit the RECORDING MODE IS VARIABLE clause in your SELECT statement; however, you must specify RECORDING MODE IS FIXED, if that is what you want.

11.1.9 RM

The RM Compiler directive offers Ryan-McFarland COBOL compatibility in several areas. For example, it enables Ryan-McFarland file status codes to be generated. For more information, see your Compatibility Guide and the chapter Ryan-McFarland COBOL V2.0 Syntax Support in your Language Reference - Additional Topics.

11.1.10 SEQUENTIAL

The SEQUENTIAL Compiler directive determines whether, by default, files with an explicit or implicit ORGANIZATION IS SEQUENTIAL clause in the SELECT statement are record sequential or line sequential.

The directive can be set to RECORD, LINE, ANSI or ADVANCING as follows:

	SEQUENTIAL"RECORD" (the default)
	SEQUENTIAL"LINE"
	SEQUENTIAL"ANSI"
	SEQUENTIAL"ADVANCING"

This directive can be useful if you are converting from other COBOL dialects where ORGANIZATION SEQUENTIAL is equivalent to this COBOL system's ORGANIZATION LINE SEQUENTIAL.

If you compile with SEQUENTIAL"LINE", and code your SELECT statement like this:

     select fd-name
         assign to ...
         organization is sequential

then the file is a line sequential file, rather than a record sequential file, which is otherwise the default.

11.2 Operating System Considerations

There are many operating system specific considerations which affect file handling. Some common ones are presented in this section, including some tips on avoiding and resolving memory problems related to file handling.

11.2.1 Power Failures

If a power outage or a system reboot occurs while an application is executing, the integrity of files that were being processed by the application when the failure occurred cannot be guaranteed.

11.2.2 File Handles

A file handle is an operating system mechanism for controlling the way a file is used by the system. Every file that your application has open requires at least one file handle. On UNIX this is called a file descriptor.

The maximum number of files that can be open at any one time, excluding the standard input, output and error files, is dependent on the configuration of your UNIX system. See your Release Notes for configuration details for your system. The open mode of a file can influence the maximum number of files that can be open at any one time. Files opened for INPUT are limited to the maximum number of files per process which has been configured for your UNIX system. Files with EXCLUSIVE access (for example, files opened for OUTPUT) acquire file locks and are thus limited to the maximum number of file locks per process which has been configured for your UNIX system. These two limits are not necessarily the same.

You can specify the maximum number of open files with the tunablemax_file_handles. See your Server Express User's Guide for details.

Some of the factors that determine how many file handles are needed are described below.

11.2.2.1 File Handles for Indexed Files

For those indexed files that comprise two files, the data file and the index file, two file handles are required for each open indexed file. IDXFORMAT"8" files comprise only one file and therefore require only one handle.

11.2.2.2 File Handles During Sorts

When a COBOL SORT is performed, several work files might be created, each of which requires a file handle.

11.2.2.3 File Handles During Animation

During animation, a certain number of Animator control files are opened. Each of these requires a file handle. This means that your application can run out of file handles during animation, but not when running standalone.

11.2.3 Network File Handling Limits

Some networks have limits built in that may be more restrictive than those imposed by your operating system. If you are running on a network, consult your network documentation to find out the maximum number of files that you can have open at one time.

11.2.4 File Size

The maximum size of any file you can create is limited by the environment variable ulimit. You may find that the default limit is rather small but this value can be increased by a superuser.

11.2.5 Opening Files

The COBOL OPEN OUTPUT statement requires two calls to the operating system; the first to open the file and the second to obtain an exclusive lock. This means that a small time lapse exists between these two calls and so it is possible for a second user to access the file between the time that the file is created and the time that it is locked exclusively. If this happens, the second user will find the file empty and an attempt to read the file will result in an "at end" error. The user performing the OPEN OUTPUT will receive a "file locked" error because the exclusive lock call fails.

11.3 Multiple-reel Files

You can specify sequential files as multiple-reel (or multiple-unit) files. This means that a sequential file can be held on more than one:

Removable disk
Cartridge tape
Tape reel
Disk partition

You must specify the file as a multiple-reel file in the SELECT clause of the Environment Division.

You cannot specify a sequential file as a multiple-reel file if it has variable-length records, since the file header record (see below) stores only one record length.

Whenever you specify a sequential file as a multiple-reel file, you are prompted to load the appropriate reel of the file. This applies also to the first reel of the file, even though it may already be loaded. The prompt is:

Please load volume nnnn of access file filename
Enter new device (if required) and <CR> when ready

where the parameters are:

`nnnn`	The four-digit reel number of the reel to be loaded within the range 0001 through 9999.
`filename`	The filename, as specified in the SELECT clause in the source program.
`access`	INPUT, OUTPUT or I/O as specified in the source program.

Ensure that the relevant disk or reel is loaded (media for output must already be formatted), and enter the filename.

If you load the wrong volume of a file that is open for input or I-O, or if the header information is in some way corrupt, an error is returned.

The prompt is displayed whenever:

A multiple reel file is opened
A CLOSE REEL statement is executed
The reel becomes full while writing to a multiple reel file (this is a forced reel swap on WRITE)
"End of reel" is true for a multiple-reel file that is opened for INPUT (or I/O on READ). This is true provided that a continuation reel was created when the file was written.

11.3.1 Multiple-reel File Header Record

Multiple-reel files have a block of header information that is 256 bytes long. This header occupies the first 256 bytes of each reel and contains information that describes the reel. It also contains 44 bytes which are reserved; you cannot alter this reserved space.

A multiple reel file header has the following structure:

Bytes

Content

0-49

Multiple-reel header start identification.

50-69

Filename. This is the name of the file as specified in the SELECT clause.

70-75

Date of file creation in the form yymmdd (year, month, day in ASCII digits). If your system does not return the date, this part of the header contains ASCII zeros.

76-83

Time of file creation in the form hhmmsscc (hours, minutes, seconds, hundredths of a second in ASCII digits). If your system does not return the time, this part of the header contains ASCII zeros.

84-127

Reserved area for your own use (see above).

128-131

Reel number. This is a four-digit ASCII value showing the reel number in the range 0001 through 9999.

132

Continuation flags. A one-byte value that shows how the reel ends. The value is ASCII "Y", "A" or "N" as follows:

Y	This reel is followed by a continuation reel.A CLOSE REEL statement was used to change the reels.
A	This reel is followed by a continuation reel.The reels were changed automatically when this reel became full.
N	This reel has no continuation. It is the last reel of the file.

133

Reserved. Currently contains the ASCII value "N".

134-145

Reel length. A 12-digit ASCII value which indicates how many bytes of information are on this reel. This part of the header contains zeros if your system cannot determine the reel size.

146-151

Record size. This is a six-digit ASCII value that shows the record length of records in this file.

152-157

Block size. This is a six-digit ASCII value that has the same value as the record size area of the header.

158-239

Reserved area containing ASCII spaces.

240-255

Multiple reel header end identification.

11.4 Indexed File Keys

The following sections describe sparse keys and how to handle files with a large number of duplicate keys.

11.4.1 Sparse Keys

A sparse key is a key for which no index entry is stored for a given key value. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the key part of the record contains only space characters.

Normally, when the Callable File Handler stores an indexed record, the key value is stored in the index file for each key defined. If an alternate key is defined as a sparse key and the key value in the record is the specified sparse value, the key is not stored. However, enough information is stored for the record to be read via the normal primary key path.

Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your disk savings.

Only alternate keys that allow duplicates can be sparse.

11.4.1.1 Using Sparse Keys

Sparse keys are defined using the SUPPRESS clause in the SELECT...ALTERNATE KEYS statement (see your Language Reference for details).

Sparse keys are not supported for C-ISAM files.

Example

 input-output section.
 file-control.
     select out-file
         assign to "outfile"
         organization is indexed
         access mode is dynamic
         record key is out-key
         alternate record key is alt-key
             with duplicates
             suppress when all "A"
*
* Hence, if we write a record whose alternate key
* has value all "A", the actual key value
* is not stored in the index file
*
     file status is file-status.

 data division.
 file section.
 fd out-file.
 01 out-rec.
     03 out-key               pic 9(10).
     03 alt-key               pic x(20).
 working-storage section.
 01 file-status               pic xx.

 procedure division.
     open output out-file
     if file-status not = "00"
*         < code to abort >
     end-if
     perform varying out-key from 1 by 1
         until out-key > 10
         move all "A" to alt-key
         write out-rec
          invalid key
*             < code to abort >
     end-write
     end-perform
     close out-file
     stop run.

11.4.2 Duplicate Keys

You can define keys in indexed files to allow duplicate values. However, we do not recommend that you allow duplicates on primary keys. This is because, with duplicates allowed, you cannot uniquely identify records in a file. See the READ, REWRITE and DELETE statements in your Language Reference.

To enable duplicate keys, you specify the phrase WITH DUPLICATES in the ALTERNATE RECORD section of the SELECT statement.

When you use duplicate keys, you should be aware that a maximum number of 65535 duplicate keys is allowed for every individual key in a standard file. Each time you specify a duplicate key, 1 is added to its occurrence number. However, because the occurrence number is used to ensure that duplicate key records are read in the order in which they were created, and any occurrence number whose record you have deleted cannot be reused, the duplicate key maximum can be reached.

To overcome this, a different file type is available: IDXFORMAT"4". You invoke this file format when you specify the IDXFORMAT"4" Compiler directive before the SELECT statement for individual files, or for all files in the program. IDXFORMAT"4" format files allow a maximum of 4,294,967,297 duplicate keys. Each record in the data file is followed by a system record holding the number of duplicate keys for that record. This makes a REWRITE or DELETE operation on a record with many duplicates much faster. This causes the data record of such files to be larger than those of the default files.

11.5 Data and Key Compression

Records and keys in files created using the Callable File Handler can be compressed so they take up less physical disk space. You can enable data compression by using the Callable File Handler to call compression routines from your program.

11.5.1 Data Compression

Data compression enables you to compress the data in a sequential or indexed file. There are two compression mechanisms provided with this COBOL system, run-length encoding (type 1), and extended run-length encoding (type 3).

When a file is defined with run-length encoding, any string of repeating characters is stored as a single character with a repetition count.

You enable data compression using the DATACOMPRESS Compiler directive.

Specifying data compression for a fixed structure sequential file changes it into a variable structure sequential file. See the chapter File Structures for further information.

The compression used by a file is determined by the last processed DATACOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:

$SET DATACOMPRESS

immediately before its SELECT statement. You must not forget to turn it off with a $SET NODATACOMPRESS before any other files are processed.

Note: We recommend that you do not use the REWRITE statement on compressed sequential files. This is because a REWRITE operation will only succeed if the length of the compressed new record is the same as the length of the compressed old record.

11.5.2 Key Compression

Key compression is a technique that can be applied to the keys of an indexed file. There are three types of compression available:

Compression of trailing nulls
Compression of trailing spaces
Compression of identical leading characters
Compression of duplicate alternate key values.

Any combination of these can be used on any key, though the compression of duplicates is only appropriate to alternate keys with duplicates enabled.

Key compression is specified using the KEYCOMPRESS Compiler directive.

The compression used by a file is determined by the last processed KEYCOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:

$SET KEYCOMPRESS

immediately before its SELECT statement. You must not forget to turn it off with a $SET NOKEYCOMPRESS before any other files are processed.

For details on the KEYCOMPRESS Compiler directive, see your Server Express User's Guide.

11.5.2.1 Compression of Trailing Nulls

When a key is defined with compression of trailing nulls, trailing nulls in a key value are not stored in the file.

For example, assume you have a primary or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being nulls. Without compression, all 30 characters of the key are stored requiring 30 bytes. With compression of trailing nulls, only 11 bytes are required (10 bytes for the 10 characters of the key and 1 byte which is used to maintain a count of the trailing nulls).

11.5.2.2 Compression of Trailing Spaces

When a key is defined with compression of trailing spaces, trailing spaces in a key value are not stored in the file. However, information is stored so that the key can be correctly located.

For example, assume you have a prime or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being spaces. Without compression, all 30 characters of the key are stored. With compression of trailing spaces, the key only occupies 11 bytes in the index file (10 bytes for the characters of the key and 1 byte as a count of the trailing spaces).

11.5.2.3 Compression of Leading Characters

When a key is defined with compression of leading characters, all leading characters that match leading characters in the preceding key are not stored in the index file. However, information is stored to enable the key to be correctly reconstructed.

For example, assume that records are written with the following key values in a key defined with compression of leading characters:

AXYZBBB BBCDEFG BBCXYZA BBCXYEF BEFGHIJ CABCDEF

The keys actually stored in the index file are:

AXYZBBB BBCDEFG XYZA    EF      EFGHIJ  CABCDEF

11.5.2.4 Compression of Duplicate Keys

When an alternate key is defined with compression of duplicates, only the first duplicate key is contained in the file. The rest are not stored, but information is stored to enable correct recreation of the keys.

For example, suppose you write a record with an alternate key value "ABC". If you have enabled compression of duplicate keys, and you write another record with the same key value, the file handler does not physically store the duplicate key value in the index file. However, the record is still available along the alternate key path.

11.5.3 Example of Using Data and Key Compression

In the following program, data compression is specified for transfile but not for masterfile. For key compression, suppression of trailing spaces and of leading characters that are the same as in the previous key is specified for keys t-rec-key and m-rec-key. Suppression of repetition of duplicate keys is also turned on for m-alt-key-1 and m-alt-key-2.

$set callfh"extfh"
$set datacompress"1" 
$set keycompress"6" 
     select transfile 
         assign to ... 
         key is t-rec-key. 
$set nokeycompress
$set nodatacompress

     select masterfile 
         assign to ... 
         organization is indexed 
$set keycompress"6"
         record key is m-rec-key

$set keycompress"7"
         alternate key is m-alt-key-1 with duplicates
         alternate key is m-alt-key-2. 
$set nokeycompress

11.5.4 Compression Routines

The routines that the Callable File Handler uses to compress data are stand-alone modules. This means that you can use them in your own applications, or alternatively make the Callable File Handler use your own data compression routine.

There can be up to 127 Micro Focus compression routines, and up to 127 user-supplied compression routines.

11.5.4.1 Micro Focus Compression Routines

Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set the byte at offset 78 in the FCD to a value between 001 and 127.

11.5.4.1.1 Micro Focus Compression Routine CBLDC001

The compression routine, CBLDC001, uses a form of run length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.

Note: This routine is not effective for use with files that contain significant occurrences of double-byte characters, including double-byte spaces, as these are not compressed.

CBLDC001 puts special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).

In the compressed file, bytes have the following meanings (hex values shown):

20-7F	(most printable characters) normal ASCII meaning.
80-9F	1-32 spaces respectively.
A0-BF	1-32 binary zeros respectively.
C0-DF	1-32 character zeros respectively.
E0-FF	1-32 occurrences of the character following.
00-1F	1-32 occurrences of the character following, and that it should be interpreted literally, not as a compression code. This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.)

11.5.4.1.2 Micro Focus Compression Routine CBLDC003

Like CBLDC001, this routine uses run length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.

The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:

bit 15	Unset - single character
bit 14	Set - compressed sequenceUnset - uncompressed sequence
bit 0-13	Compressed character(s) or count of uncompressed characters

The length of the character string depends on the header bits:

bit 14 and 15 set	Two repeating characters.
Only bit 14 is set	One repeating character
Otherwise	Between 1 and 63 uncompressed characters.

11.5.4.2 Calling a Micro Focus Compression Routine

For data file compression the Callable File Handler calls the compression routine that you specify in the DATACOMPRESS Compiler directive.

To call a Micro Focus data compression routine use the syntax:

COBOL:

call "CBLDCnnn" using input-buffer,
                        input-buffer-size,
                        output-buffer,
                        output-buffer-size,
                        compression-type

cbldcnnn(input_buffer, &input_buffer_size,  
         output_buffer, &output_buffer_size,
         &compression-type);

where the parameters are:

`nnn`	A data compression routine in the range 001 to 127.
`input_buffer`	A PIC X(size) data item and is the data to compress or decompress; maximum size is 65535.
`input_buffer_size`	A two byte (int in C, PIC XX COMP-5 in COBOL) data item and outputs the length of data in the input-buffer.
`output_buffer`	A PIC X(size) data item and is the buffer to contain the resulting data.
`output_buffer_size`	A two byte (int in C, PIC XX COMP-5 in COBOL) data item. On entry to the routine this data item must contain the size of the output buffer available; on exit this contains the length of the data in the buffer.
`compression-type`	A one byte (char in C, PIC X COMP-X in COBOL) data item. This specifies if the input data is to be compressed or decompressed: 0 - compress 1 - decompress.

The RETURN-CODE special register indicates whether the operation succeeded or not. Compression or decompression fails only if the output buffer is too small to accept the results.

0 indicates success
1 indicates failure.

11.5.4.3 User-supplied Compression Routines

User-supplied compression routines must be stored in modules called USRDCnnn, where nnn is within the range 128 to 255.

To call a user-supplied routine, use the same syntax as for calling a Micro Focus routine, but use the filename USRDCnnn instead of CBLDCnnn where nnn must be a value in the range 128 through 255.

To make your compression routine available to your system, you must create callable shared object that can be called when needed.

You can map calls to data compression routines in programs from previous UNIX COBOL systems to the new calls using the cob option:

-m CBL_DATA_COMPRESS_nnn=CBLDCnnn

Notes:

Your compression routines must not make any calls to the Callable File Handler, as this would result in a loop. If you need file access, use byte-stream file I/O.
Once you have enabled data compression for a file, you must always subsequently specify the same type of compression for that file. If you do not, you receive a run-time system error when you open the file.
Data compression has no effect on files other than those in indexed or record sequential format and is ignored at compile time for files that do not support it.
If you want to read a fixed-length sequential file that has had its data compressed, you must specify that the file is compressed in your program. You do this by specifying the DATACOMPRESS Compiler directive.

File Handling Library Routines

Byte-stream File Handling