File Organizations

A file is a collection of data, usually stored on disk. As a logical entity, a file enables you to divide your data into meaningful groups, for example, you can use one file to hold all of a company's product information and another to hold all of its personnel information. As a physical entity, a file should be considered in terms of its organization.

2.1 File Organizations

The term "file organization" refers to the way in which data is stored in a file and, consequently, the method(s) by which it can be accessed. This COBOL system supports three file organizations: sequential, relative and indexed.

2.1.1 Sequential Files

A sequential file is one in which the individual records can only be accessed sequentially, that is, in the same order as they were originally written to the file. New records are always added to the end of the file.

2.1.1.1 Record Sequential Files

Record sequential files are nearly always referred to simply as sequential files because when you create a file and specify the organization as sequential, a record sequential file is created by default.

To define a file as record sequential, specify ORGANIZATION IS RECORD SEQUENTIAL in the SELECT statement for the file in your COBOL program, for example:

Because record sequential is the default for sequential files, you don't actually need to specify ORGANIZATION IS RECORD SEQUENTIAL, you could simply use ORGANIZATION IS SEQUENTIAL (as long as the Compiler directive, SEQUENTIAL, has not been set).

2.1.1.2 Line Sequential Files

The primary use of line sequential files (which are also known as "text files" or "ASCII files") is for display-only data. Most PC editors, for example Notepad, produce line sequential files.

In a line sequential file, each record in the file is separated from the next by a record delimiter. The record delimiter, which is the line feed (x"0A") character, is inserted after the last non-space character in each record. A WRITE statement removes trailing spaces from the data record and appends the record delimiter. A READ statement removes the record delimiter and, if necessary, pads the data record (with trailing spaces) to the record size defined by the program reading the data.

To define a file as line sequential, specify ORGANIZATION IS LINE SEQUENTIAL in the SELECT statement for the file in your COBOL program, for example:

2.1.1.3 Printer Sequential Files

Printer sequential files are files which are destined for a printer, either directly, or by spooling to a disk file. They consist of a sequence of print records with zero or more vertical positioning characters (such as line-feed) between records. A print record consists of zero or more printable characters and is terminated by a carriage return (x"0D").

With a printer sequential file, the OPEN statement causes a x"0D" to be written to the file to ensure that the printer is located at the first character position before printing the first print record. The WRITE statement causes trailing spaces to be removed from the print record before it is written to the printer with a terminating carriage return (x"0D"). The BEFORE or AFTER clause can be specified in the WRITE statement to cause one or more line-feed characters (x"0A"), a form-feed character (x"0C"), or a vertical tab character (x"0B") to be sent to the printer before or after writing the print record.

You can define a file as printer sequential by specifying ASSIGN TO LINE ADVANCING FILE or ASSIGN TO PRINTER in the SELECT statement, for example:

2.1.2 Relative Files

A relative file is a file in which each record is identified by its ordinal position within the file (record 1, record 2 and so on). This means that records can be accessed randomly as well as sequentially. For sequential access, you simply execute a READ or WRITE statement to access the next record in the file. For random access, you must define a data-item as the relative key and then specify, in the data-item, the ordinal number of the record that you want to READ or WRITE.

Access to relative files is fast, because the physical location of a record in the file is directly calculated from its key.

Although variable-length records can be declared for a relative file, this can be wasteful of disk space because the system allocates the maximum record length for all records in the file, and pads unused character positions. This is done to maintain the fixed relationship between the key and the location of the record.

As relative files always contain fixed length records, no space is saved by specifying data compression. In fact, if data compression is specified for a relative file, it is ignored by the Micro Focus File Handler.

Each record in a relative file is followed by a two-byte record marker which indicates the current status of the record. The status of a record can be:

When you delete a record from a relative file, the record's record marker is updated to show that it has been deleted but the contents of a deleted record physically remain in the file until a new record is written. If, for security reasons, you want to ensure that the actual data does not exist in the file, you must overwrite the record (for example with space characters) using REWRITE before you delete it.

To define a relative file, specify ORGANIZATION IS RELATIVE in the SELECT statement for the file in your COBOL program.

The example code above defines a relative file. The access mode is random and so a relative key is defined, relfil-key. For random access, you must always supply a record number in the relative key, before attempting to read a record from the file.

If you specify ACCESS MODE IS DYNAMIC, you can access the file both sequentially and randomly.

2.1.3 Indexed Files

An indexed file is a file in which each record includes a primary key. To distinguish one record from another, the value of the primary key must be unique for each record. Records can then be accessed randomly by specifying the value of the record's primary key. Indexed file records can also be accessed sequentially. As well as a primary key, indexed files can contain one or more additional keys known as alternate keys. The value of a record's alternate key(s) does not have to be unique.

To define a file as indexed, specify ORGANIZATION IS INDEXED in the SELECT statement for the file in your COBOL program. You must also specify a primary key using the RECORD KEY clause:

Most types of indexed file actually comprise two separate files: the data file (containing the record data) and the index file (containing the index structure). Where this is the case, the name that you specify in your COBOL program is given to the data file and the name of the associated index file is produced by adding an .idx extension to the data file name. You should avoid using the .idx extension in other contexts.

The index is built up as an inverted tree structure that grows as records are added.

With indexed files, the number of disk accesses required to locate a randomly selected record depends primarily on the number of records in the file and the length of the record key. File I/O is faster when reading the file sequentially.

We strongly recommend that you take regular backups of all file types but there are situations with indexed files (for example, media corruption) that can lead to only one of the two files becoming unusable. If the index file is lost in this way it is possible, using the Rebuild utility, to recover the index from the data file and so reduce the time lost due to a failure.

2.1.3.1 Primary Keys

The primary key of an indexed file is defined using the RECORD KEY IS clause in the SELECT statement:

2.1.3.2 Alternate Keys

As well as the primary key, each record can have any number of additional keys, known as alternate keys. Alternate keys are defined using the ALTERNATE RECORD KEY IS clause in the SELECT statement:

2.1.3.3 Duplicate Keys

You can define keys which allow duplicate values. However, you should not allow duplicates on primary keys as the value of a record's primary key must be unique.

When you use duplicate keys you should be aware that there is a limit on the number of times the same value can be specified for an individual key. Each time you specify the same value for a duplicate key, an increment of one is added to the key's occurrence number. The maximum number of duplicate values permitted for an individual key varies according to the type of indexed file (see the section Indexed Files, in the chapter File Structures for a full list of indexed file types and their characteristics). The occurrence number is used to ensure that duplicate key records are read in the order in which they were created, so any occurrence number whose record you have deleted cannot be reused. This means that it is possible to reach the maximum number of duplicate values, even if some of those keys have already been deleted.

Some types of indexed file contain a duplicate occurrence record in the data file (see the section Indexed Files, in the chapter File Structures for a full list of indexed file types and their characteristics). In these files, each record in the data file is followed by a system record holding, for each duplicate key in that record, the occurrence number of the key. This number is just a counter of the number of times that key value has been used during the history of the file. The prescence of the duplicate occurrence record makes REWRITE and DELETE operations on a record with many duplicates much faster but causes the data records of such files to be larger than those of a standard file.

To enable duplicate values to be specified for alternate keys, use WITH DUPLICATES in the ALTERNATE RECORD KEY clause in the SELECT statement:

2.1.3.4 Sparse Keys

A sparse key is a key for which no index entry is stored for a given value of that key. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the part of the record it occupies contains only space characters.

Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your saving of disk space.

To enable sparse keys, use SUPPRESS WHEN ALL in the ALTERNATE RECORD KEY clause in the SELECT statement:

In this example, if a record is written for which the value of the alternate key is all A's, the actual key value is not stored in the index file.

2.1.3.5 Indexed File Access

Both the primary and alternate keys can be used to read records from an indexed file, either directly (random access) or in key sequence (sequential access). The access mode can be:

The method of accessing an indexed file is defined using the ACCESS MODE IS clause in the SELECT statement, for example:

2.2 Fixed and Variable Length Records

A file can contain fixed length records (all the records are exactly the same length) or variable length records (the length of each record varies). Using variable length records may enable you to save disk space. For example, if your application generates many short records with occasional long ones and you use fixed length records, you need to make the fixed record length equal to the length of the longest record. This wastes a lot of disk space, so using variable length records would be a great advantage.

2.3 File Headers

A file header is a block of 128 bytes at the start of the file. Indexed files, record sequential files with variable length records and relative files with variable length records all contain file headers. In addition, each record in these files is preceded by a 2 or 4 byte record header.

Further detail on file and record headers and the structure of files with headers is available in the chapter File Structures.