|File Handling Library Routines||Byte-stream File Handling|
This chapter covers a number of file handling topics. These include:
If your program handles data files, there are several Compiler directives that you should be aware of. Used properly, they can simplify your code, set up default language behavior and give you access to important language extensions, such as ANSI'85.
These directives include:
The ANS85 Compiler directive enables ANSI'85 COBOL language
extensions and gives you ANSI'85 file status codes.
If you specify the NOANS85 directive, you do not get ANSI'85 file status codes. Additionally, you cannot use ANSI'85 reserved words.
If you specify ANS85"SYNTAX", you can use ANSI'85 syntax but your program will return ANSI'74 file status codes.
For example, if you compile a program with NOANS85, the file status code for "File not found" is 9/013 (first status byte = ASCII 9, second status byte = decimal 13), the Micro Focus run-time system error code for this problem. If you omit the NOANS85 directive, the file status for "File not found" is 3/5, (first status byte = ASCII 3, second status byte = ASCII 5), the ANSI'85 file status code for this problem.
The ASSIGN Compiler directive can be set to either EXTERNAL or DYNAMIC as follows:
The ASSIGN Compiler directive specifies whether the default file assignment is dynamic or external.
If ASSIGN"EXTERNAL" is specified as a directive, and the SELECT statement is coded like this:
select fd-name assign to var1
then COBOL treats the SELECT as if it were coded like this:
select fd-name assign to external var1
You can use the CALLFH directive to generate direct calls for all file I/O operations, using the Callable File Handler interface; this can be either the Micro Focus File Handler, Fileshare, or a file handler you have created. To use Fileshare, specify CALLFH(FHREDIR).
Use the COBFSTATCONV Compiler directive to specify that any status values retuned for a file should be converted using a status conversion routine before being returned to a COBOL program.
The name of the coversion status routine is specified at run time by the CONVERTSTATUS configuration option, or the COBFSTATCONV environment variable.
If you have files with large areas that are unused or reserved for future use, you can save disk space by compressing the files.
To compress data files, two directives are available with COBOL: DATACOMPRESS and KEYCOMPRESS . These directives must be specified when you create the file.
For indexed files and record sequential files, you can suppress repeated characters through use of the DATACOMPRESS directive.
Note: Compressed files are variable-length record format. Specifying data compression on relative files does not save space because the maximum record length is always written out.
For indexed files, you can compress keys through use of the KEYCOMPRESS directive.
These directives can be specified through a $SET statement, enabling compression on selected files.
For more information on DATACOMPRESS and KEYCOMPRESS, see the section Data and Key Compression.
The IDXFORMAT Compiler directive enables you to specify the type of indexed file you want to create. For example, you can create C-ISAM files using IDXFORMAT"1" and BTRIEVE files using IDXFORMAT "5" or IDXFORMAT "6".
For handling files with many alternate keys, use IDXFORMAT"4" or IDXFORMAT"8". This improves performance for delete and rewrite operations and enables a larger number of duplicate keys.
To create files that might grow beyond 1 gigabyte (2 gigabytes for non-shared files) use IDXFORMAT"8".
The OPTIONAL-FILE Compiler directive is specified as one of:
With the OPTIONAL-FILE directive, SELECT statements for files opened I-O or EXTEND are treated as if OPTIONAL were coded in the SELECT clause. This means that if a non-existent file is opened I-O or EXTEND:
The RECMODE Compiler directive specifies whether the default RECORDING MODE for a file is fixed or variable format as follows:
If RECMODE"V" is specified, you can omit the RECORDING MODE IS VARIABLE clause in your SELECT statement; however, you must specify RECORDING MODE IS FIXED, if that is what you want.
The RM Compiler directive offers Ryan-McFarland COBOL compatibility in several areas. For example, it enables Ryan-McFarland file status codes to be generated. For more information, see your Compatibility Guide and the chapter Ryan-McFarland COBOL V2.0 Syntax Support in your Language Reference - Additional Topics.
The SEQUENTIAL Compiler directive determines whether, by default, files with an explicit or implicit ORGANIZATION IS SEQUENTIAL clause in the SELECT statement are record sequential or line sequential.
The directive can be set to RECORD, LINE, ANSI or ADVANCING as follows:
|SEQUENTIAL"RECORD" (the default)|
This directive can be useful if you are converting from other COBOL dialects where ORGANIZATION SEQUENTIAL is equivalent to this COBOL system's ORGANIZATION LINE SEQUENTIAL.
If you compile with SEQUENTIAL"LINE", and code your SELECT statement like this:
select fd-name assign to ... organization is sequential
then the file is a line sequential file, rather than a record sequential file, which is otherwise the default.
There are many operating system specific considerations which affect file handling. Some common ones are presented in this section, including some tips on avoiding and resolving memory problems related to file handling.
If a power outage or a system reboot occurs while an application is executing, the integrity of files that were being processed by the application when the failure occurred cannot be guaranteed.
A file handle is an operating system mechanism for controlling the way a file is used by the system. Every file that your application has open requires at least one file handle. On UNIX this is called a file descriptor.
The maximum number of files that can be open at any one time, excluding the standard input, output and error files, is dependent on the configuration of your UNIX system. See your Release Notes for configuration details for your system. The open mode of a file can influence the maximum number of files that can be open at any one time. Files opened for INPUT are limited to the maximum number of files per process which has been configured for your UNIX system. Files with EXCLUSIVE access (for example, files opened for OUTPUT) acquire file locks and are thus limited to the maximum number of file locks per process which has been configured for your UNIX system. These two limits are not necessarily the same.
You can specify the maximum number of open files with the tunable
max_file_handles. See your Server
Express User's Guide for details.
Some of the factors that determine how many file handles are needed are described below.
For those indexed files that comprise two files, the data file and the index file, two file handles are required for each open indexed file. IDXFORMAT"8" files comprise only one file and therefore require only one handle.
When a COBOL SORT is performed, several work files might be created, each of which requires a file handle.
During animation, a certain number of Animator control files are opened. Each of these requires a file handle. This means that your application can run out of file handles during animation, but not when running standalone.
Some networks have limits built in that may be more restrictive than those imposed by your operating system. If you are running on a network, consult your network documentation to find out the maximum number of files that you can have open at one time.
The maximum size of any file you can create is limited by the environment variable ulimit. You may find that the default limit is rather small but this value can be increased by a superuser.
The COBOL OPEN OUTPUT statement requires two calls to the operating system; the first to open the file and the second to obtain an exclusive lock. This means that a small time lapse exists between these two calls and so it is possible for a second user to access the file between the time that the file is created and the time that it is locked exclusively. If this happens, the second user will find the file empty and an attempt to read the file will result in an "at end" error. The user performing the OPEN OUTPUT will receive a "file locked" error because the exclusive lock call fails.
You can specify sequential files as multiple-reel (or multiple-unit) files. This means that a sequential file can be held on more than one:
You must specify the file as a multiple-reel file in the SELECT clause of the Environment Division.
You cannot specify a sequential file as a multiple-reel file if it has variable-length records, since the file header record (see below) stores only one record length.
Whenever you specify a sequential file as a multiple-reel file, you are prompted to load the appropriate reel of the file. This applies also to the first reel of the file, even though it may already be loaded. The prompt is:
Please load volume nnnn of access file filename Enter new device (if required) and <CR> when ready
where the parameters are:
||The four-digit reel number of the reel to be loaded within the range 0001 through 9999.|
||The filename, as specified in the SELECT clause in the source program.|
||INPUT, OUTPUT or I/O as specified in the source program.|
Ensure that the relevant disk or reel is loaded (media for output must already be formatted), and enter the filename.
If you load the wrong volume of a file that is open for input or I-O, or if the header information is in some way corrupt, an error is returned.
The prompt is displayed whenever:
Multiple-reel files have a block of header information that is 256 bytes long. This header occupies the first 256 bytes of each reel and contains information that describes the reel. It also contains 44 bytes which are reserved; you cannot alter this reserved space.
A multiple reel file header has the following structure:
|0-49||Multiple-reel header start identification.|
|50-69||Filename. This is the name of the file as specified in the SELECT clause.|
|70-75||Date of file creation in the form yymmdd (year, month, day in ASCII digits). If your system does not return the date, this part of the header contains ASCII zeros.|
|76-83||Time of file creation in the form hhmmsscc (hours, minutes, seconds, hundredths of a second in ASCII digits). If your system does not return the time, this part of the header contains ASCII zeros.|
|84-127||Reserved area for your own use (see above).|
|128-131||Reel number. This is a four-digit ASCII value showing the reel number in the range 0001 through 9999.|
|132||Continuation flags. A one-byte value that shows how
the reel ends. The value is ASCII "Y", "A" or "N"
|133||Reserved. Currently contains the ASCII value "N".|
|134-145||Reel length. A 12-digit ASCII value which indicates how many bytes of information are on this reel. This part of the header contains zeros if your system cannot determine the reel size.|
|146-151||Record size. This is a six-digit ASCII value that shows the record length of records in this file.|
|152-157||Block size. This is a six-digit ASCII value that has the same value as the record size area of the header.|
|158-239||Reserved area containing ASCII spaces.|
|240-255||Multiple reel header end identification.|
The following sections describe sparse keys and how to handle files with a large number of duplicate keys.
A sparse key is a key for which no index entry is stored for a given key value. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the key part of the record contains only space characters.
Normally, when the Callable File Handler stores an indexed record, the key value is stored in the index file for each key defined. If an alternate key is defined as a sparse key and the key value in the record is the specified sparse value, the key is not stored. However, enough information is stored for the record to be read via the normal primary key path.
Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your disk savings.
Only alternate keys that allow duplicates can be sparse.
Sparse keys are defined using the SUPPRESS clause in the SELECT...ALTERNATE KEYS statement (see your Language Reference for details).
Sparse keys are not supported for C-ISAM files.
input-output section. file-control. select out-file assign to "outfile" organization is indexed access mode is dynamic record key is out-key alternate record key is alt-key with duplicates suppress when all "A" * * Hence, if we write a record whose alternate key * has value all "A", the actual key value * is not stored in the index file * file status is file-status. data division. file section. fd out-file. 01 out-rec. 03 out-key pic 9(10). 03 alt-key pic x(20). working-storage section. 01 file-status pic xx.
procedure division. open output out-file if file-status not = "00" * < code to abort > end-if perform varying out-key from 1 by 1 until out-key > 10 move all "A" to alt-key write out-rec invalid key * < code to abort > end-write end-perform close out-file stop run.
You can define keys in indexed files to allow duplicate values. However, we do not recommend that you allow duplicates on primary keys. This is because, with duplicates allowed, you cannot uniquely identify records in a file. See the READ, REWRITE and DELETE statements in your Language Reference.
To enable duplicate keys, you specify the phrase WITH DUPLICATES in the ALTERNATE RECORD section of the SELECT statement.
When you use duplicate keys, you should be aware that a maximum number of 65535 duplicate keys is allowed for every individual key in a standard file. Each time you specify a duplicate key, 1 is added to its occurrence number. However, because the occurrence number is used to ensure that duplicate key records are read in the order in which they were created, and any occurrence number whose record you have deleted cannot be reused, the duplicate key maximum can be reached.
To overcome this, a different file type is available: IDXFORMAT"4". You invoke this file format when you specify the IDXFORMAT"4" Compiler directive before the SELECT statement for individual files, or for all files in the program. IDXFORMAT"4" format files allow a maximum of 4,294,967,297 duplicate keys. Each record in the data file is followed by a system record holding the number of duplicate keys for that record. This makes a REWRITE or DELETE operation on a record with many duplicates much faster. This causes the data record of such files to be larger than those of the default files.
Records and keys in files created using the Callable File Handler can be compressed so they take up less physical disk space. You can enable data compression by using the Callable File Handler to call compression routines from your program.
Data compression enables you to compress the data in a sequential or indexed file. There are two compression mechanisms provided with this COBOL system, run-length encoding (type 1), and extended run-length encoding (type 3).
When a file is defined with run-length encoding, any string of repeating characters is stored as a single character with a repetition count.
You enable data compression using the DATACOMPRESS Compiler directive.
Specifying data compression for a fixed structure sequential file changes it into a variable structure sequential file. See the chapter File Structures for further information.
The compression used by a file is determined by the last processed DATACOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:
immediately before its SELECT statement. You must not forget to turn it off with a $SET NODATACOMPRESS before any other files are processed.
Note: We recommend that you do not use the REWRITE statement on compressed sequential files. This is because a REWRITE operation will only succeed if the length of the compressed new record is the same as the length of the compressed old record.
Key compression is a technique that can be applied to the keys of an indexed file. There are three types of compression available:
Any combination of these can be used on any key, though the compression of duplicates is only appropriate to alternate keys with duplicates enabled.
Key compression is specified using the KEYCOMPRESS Compiler directive.
The compression used by a file is determined by the last processed KEYCOMPRESS directive when the SELECT statement for the file is processed. Consequently, the compression type can be set for an individual file by using a line of the form:
immediately before its SELECT statement. You must not forget to turn it off with a $SET NOKEYCOMPRESS before any other files are processed.
For details on the KEYCOMPRESS Compiler directive, see your Server Express User's Guide.
When a key is defined with compression of trailing nulls, trailing nulls in a key value are not stored in the file.
For example, assume you have a primary or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being nulls. Without compression, all 30 characters of the key are stored requiring 30 bytes. With compression of trailing nulls, only 11 bytes are required (10 bytes for the 10 characters of the key and 1 byte which is used to maintain a count of the trailing nulls).
When a key is defined with compression of trailing spaces, trailing spaces in a key value are not stored in the file. However, information is stored so that the key can be correctly located.
For example, assume you have a prime or alternate key that is 30 characters long, and that you write a record in which only the first 10 characters of the key are used, the rest being spaces. Without compression, all 30 characters of the key are stored. With compression of trailing spaces, the key only occupies 11 bytes in the index file (10 bytes for the characters of the key and 1 byte as a count of the trailing spaces).
When a key is defined with compression of leading characters, all leading characters that match leading characters in the preceding key are not stored in the index file. However, information is stored to enable the key to be correctly reconstructed.
For example, assume that records are written with the following key values in a key defined with compression of leading characters:
AXYZBBB BBCDEFG BBCXYZA BBCXYEF BEFGHIJ CABCDEF
The keys actually stored in the index file are:
AXYZBBB BBCDEFG XYZA EF EFGHIJ CABCDEF
When an alternate key is defined with compression of duplicates, only the first duplicate key is contained in the file. The rest are not stored, but information is stored to enable correct recreation of the keys.
For example, suppose you write a record with an alternate key value "ABC". If you have enabled compression of duplicate keys, and you write another record with the same key value, the file handler does not physically store the duplicate key value in the index file. However, the record is still available along the alternate key path.
In the following program, data compression is specified for transfile
but not for masterfile. For key compression, suppression of
trailing spaces and of leading characters that are the same as in the
previous key is specified for keys
Suppression of repetition of duplicate keys is also turned on for
$set callfh"extfh" $set datacompress"1" $set keycompress"6" select transfile assign to ... key is t-rec-key. $set nokeycompress $set nodatacompress
select masterfile assign to ... organization is indexed $set keycompress"6" record key is m-rec-key
$set keycompress"7" alternate key is m-alt-key-1 with duplicates alternate key is m-alt-key-2. $set nokeycompress
The routines that the Callable File Handler uses to compress data are stand-alone modules. This means that you can use them in your own applications, or alternatively make the Callable File Handler use your own data compression routine.
There can be up to 127 Micro Focus compression routines, and up to 127 user-supplied compression routines.
Micro Focus routines are stored in modules called CBLDCnnn, where nnn is within the range 001 to 127. To use Micro Focus compression routines, set the byte at offset 78 in the FCD to a value between 001 and 127.
The compression routine, CBLDC001, uses a form of run length encoding. This is a method of compression that detects strings (runs) of the same character and reduces them to an identifier, a count and one occurrence of the character.
Note: This routine is not effective for use with files that contain significant occurrences of double-byte characters, including double-byte spaces, as these are not compressed.
CBLDC001 puts special emphasis on runs of spaces, binary zeros and character zeros (that can be reduced to a single character) and printable characters (that are reduced to two characters consisting of a count followed by the repeated character).
In the compressed file, bytes have the following meanings (hex values shown):
|20-7F||(most printable characters) normal ASCII meaning.|
|80-9F||1-32 spaces respectively.|
|A0-BF||1-32 binary zeros respectively.|
|C0-DF||1-32 character zeros respectively.|
|E0-FF||1-32 occurrences of the character following.|
|00-1F||1-32 occurrences of the character following, and that
it should be interpreted literally, not as a compression code.
This is used when characters in the range 00-1F, 80-9F, A0-BF, C0-DF or E0-FF occur in the original data. (Thus, one such character is expanded to two bytes; otherwise, no penalty is incurred by the compression.)
Like CBLDC001, this routine uses run length encoding, but detects strings (runs) of single- or double-byte characters. This routine is therefore suitable for DBCS characters, but can also be used in place of CBLDC001.
The format of the compression is two header bytes followed by one or more characters. The bits in the header bytes indicate:
|bit 15||Unset - single character|
|bit 14||Set - compressed sequenceUnset - uncompressed sequence|
|bit 0-13||Compressed character(s) or count of uncompressed characters|
The length of the character string depends on the header bits:
|bit 14 and 15 set||Two repeating characters.|
|Only bit 14 is set||One repeating character|
|Otherwise||Between 1 and 63 uncompressed characters.|
For data file compression the Callable File Handler calls the compression routine that you specify in the DATACOMPRESS Compiler directive.
To call a Micro Focus data compression routine use the syntax:
call "CBLDCnnn" using input-buffer, input-buffer-size, output-buffer, output-buffer-size, compression-type
cbldcnnn(input_buffer, &input_buffer_size, output_buffer, &output_buffer_size, &compression-type);
where the parameters are:
||A data compression routine in the range 001 to 127.|
||A PIC X(size) data item and is the data to compress or decompress; maximum size is 65535.|
||A two byte (int in C, PIC XX COMP-5 in COBOL) data item and outputs the length of data in the input-buffer.|
||A PIC X(size) data item and is the buffer to contain the resulting data.|
||A two byte (int in C, PIC XX COMP-5 in COBOL) data item. On entry to the routine this data item must contain the size of the output buffer available; on exit this contains the length of the data in the buffer.|
||A one byte (char in C, PIC X COMP-X in COBOL) data
item. This specifies if the input data is to be compressed or
0 - compress
The RETURN-CODE special register indicates whether the operation succeeded or not. Compression or decompression fails only if the output buffer is too small to accept the results.
0 indicates success
1 indicates failure.
User-supplied compression routines must be stored in modules called USRDCnnn, where nnn is within the range 128 to 255.
To call a user-supplied routine, use the same syntax as for calling a Micro Focus routine, but use the filename USRDCnnn instead of CBLDCnnn where nnn must be a value in the range 128 through 255.
To make your compression routine available to your system, you must create callable shared object that can be called when needed.
You can map calls to data compression routines in programs from previous UNIX COBOL systems to the new calls using the cob option:
Copyright © 2000 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names used herein are protected by international law.
|File Handling Library Routines||Byte-stream File Handling|