Introduction

Filenames

Chapter 2: File Organizations

A file is a collection of data that is, typically, stored on disk. As a logical entity, a file enables you to divide your data into meaningful groups. For example, you can use one file to hold all of a company's product information and another file to hold all personnel information.

The term "file organization" refers to the way in which data is stored physically in a file, which determines the way that you access the data subsequently.

This COBOL system supports three file organizations: sequential, relative and indexed. Depending upon the file organization, you have up to three ways of accessing the data:

File Organization	Sequential Access	Random Access	Dynamic Access
Sequential	Yes	No	No
Relative	Yes	Yes	Yes
Indexed	Yes	Yes	Yes

2.1 Sequential Files

A sequential file is one in which the individual records can only be accessed sequentially, that is, in the same order as they were originally written to the file. New records are always added to the end of the file.

Three types of sequential file are supported by this COBOL system:

Record sequential
Line sequential
Printer sequential

2.1.1 Record Sequential Files

Record sequential files are nearly always referred to simply as sequential files because when you create a file and specify the organization as sequential, a record sequential file is created by default.

To define a file as record sequential, specify ORGANIZATION IS RECORD SEQUENTIAL in the SELECT statement for the file in your COBOL program, for example:

select recseq assign to "recseq.dat"
   organization is record sequential.

Because record sequential is the default for sequential files, you don't need to specify ORGANIZATION IS RECORD SEQUENTIAL. As long as you do not set the Compiler directive, SEQUENTIAL, you can simply use ORGANIZATION IS SEQUENTIAL.

2.1.2 Line Sequential Files

The primary use of line sequential files (which are also known as "text files" or "ASCII files") is for display-only data. Most PC editors, for example Notepad, produce line sequential files.

In a line sequential file, each record in the file is separated from the next by a record delimiter. The record delimiter, which comprises the carriage return (x"0D") and the line feed (x"0A") characters, is inserted after the last non-space character in each record. A WRITE statement removes trailing spaces from the data record and appends the record delimiter. A READ statement removes the record delimiter and, if necessary, pads the data record (with trailing spaces) to the record size defined by the program reading the data.

To define a file as line sequential, specify ORGANIZATION IS LINE SEQUENTIAL in the SELECT statement for the file in your COBOL program, for example:

select lineseq assign to "lineseq.dat"
   organization is line sequential.

2.1.3 Printer Sequential Files

Printer sequential files are files which are destined for a printer, either directly or by spooling to a disk file. They consist of a sequence of print records with zero or more vertical positioning characters (such as line-feed) between records. A print record consists of zero or more printable characters and is terminated by a carriage return (x"0D").

With a printer sequential file, the OPEN statement causes a x"0D" to be written to the file to ensure that the printer is located at the first character position before printing the first print record. The WRITE statement causes trailing spaces to be removed from the print record before it is written to the printer with a terminating carriage return (x"0D"). The BEFORE or AFTER clause can be specified in the WRITE statement to cause one or more line-feed characters (x"0A"), a form-feed character (x"0C") or a vertical tab character (x"0B") to be sent to the printer before or after writing the print record.

Printer sequential files should not be opened for INPUT or I/O.

You can define a file as printer sequential by specifying ASSIGN TO LINE ADVANCING FILE or ASSIGN TO PRINTER in the SELECT statement, for example:

select printseq
   assign to line advancing file "printseq.dat".

2.2 Relative Files

A relative file is a file in which each record is identified by its ordinal position in the file (record 1, record 2 and so on). This means that records can be accessed randomly as well as sequentially:

Sequential access:
Simply execute a READ or WRITE statement to access the next record in the file
Random access:
Define a data-item as the relative key. Then specify, in the data-item, the ordinal number of the record that you need to READ or WRITE.

Because records can be accessed randomly, access to relative files is fast. If you need to save disk space, however, you should avoid relative files. Although you can declare variable length records for a relative file, the system assumes the maximum record length for all WRITE statements to the file and pads the unused character positions. This is done so that the COBOL file handling routines can quickly calculate the physical location of any record, given the record's record number in the file.

As relative files always contain fixed length records, no space is saved by specifying data compression. In fact, if data compression is specified for a relative file, it is ignored by the Micro Focus File Handler.

Each record in a relative file is followed by a two-byte record marker which indicates the current status of the record. The status of a record can be:

x"0D0A" - record present

x"0D00" - record deleted or never written

When you delete a record from a relative file, the record's contents are not removed immediately. The record's record marker is updated to show that it has been deleted, but the contents of the deleted record remain physically in the file until a new record is written. If you need to remove the data from the file for security reasons, follow the procedure below:

Use REWRITE to overwrite the record, for example with space characters
Delete the record

To define a relative file, specify ORGANIZATION IS RELATIVE in the SELECT statement for the file in your COBOL program.

To access records randomly, you must also:

Specify ACCESS MODE IS RANDOM or ACCESS MODE IS DYNAMIC in the SELECT statement for the file
Define a relative key in the working-storage section of your program

For example:

select relfil assign to "relfil.dat"
   organization is relative
   access mode is random
   relative key is relfil-key.
...
working-storage section.
01 relfil-key pic 9(8) comp-x.

The example code above defines a relative file. The access mode is random and so a relative key relfil-key is defined. For random access, you must always supply a record number in the relative key, before trying to read a record from the file.

If you specify ACCESS MODE IS DYNAMIC, you can access the file both sequentially and randomly.

2.3 Indexed Files

An indexed file is a file in which each record includes a primary key. To distinguish one record from another, the value of the primary key must be unique for each record. Records can then be accessed randomly by specifying the value of the record's primary key. Indexed file records can also be accessed sequentially.

As well as a primary key, indexed files can contain one or more additional keys known as alternate keys. The value of a record's alternate key(s) does not have to be unique.

To define a file as indexed, specify ORGANIZATION IS INDEXED in the SELECT statement for the file in your COBOL program. You must also specify a primary key using the RECORD KEY clause:

select idxfile assign to "idx.dat"
   organization is indexed
   record key is idxfile-record-key.

Most types of indexed file actually comprise two separate files: the data file (containing the record data) and the index file (containing the index structure). Where this is the case, the name that you specify in your COBOL program is given to the data file and the name of the associated index file is produced by adding an .idx extension to the data file name. You should avoid using the .idx extension in other contexts.

The index is built up as an inverted tree structure that grows as records are added.

With indexed files, the number of disk accesses required to locate a randomly selected record depends primarily on the number of records in the file and the length of the record key. File I/O is faster when reading the file sequentially.

We strongly recommend that you take regular backups of all file types. However, events such as media corruption can result in only one of the two files becoming unusable. If you do lose an index file, use the Rebuild utility to recover the index from the data file and so reduce the time lost due to a failure. For more information, see the chapter Rebuild.

2.3.1 Primary Keys

To define the primary key of an indexed file use the RECORD KEY IS clause in the SELECT statement:

select idxfile assign to "idx.dat"
   organization is indexed
   record key is idxfile-record-key.

2.3.2 Alternate Keys

As well as the primary key, each record can have any number of additional keys, known as alternate keys. Alternate keys are defined using the ALTERNATE RECORD KEY IS clause in the SELECT statement:

select idxfile assign to "idx.dat"
   organization is indexed
   record key is idxfile-record-key
   alternate record key is idxfile-alt-key.

2.3.3 Duplicate Keys

You can define keys which allow duplicate values. However, do not allow duplicates on primary keys as the value of a record's primary key must be unique.

When you use duplicate keys, be aware that there is a limit on the number of times you can specify the same value for an individual key. Each time you specify the same value for a duplicate key, an increment of one is added to the key's occurrence number. The maximum number of duplicate values permitted for an individual key varies according to the type of indexed file. For a full list of indexed file types and their characteristics, see the Net Express online help. (Click Help Topics on the Help menu. Then, on the Index tab, double-click Indexed file,Types.)

Net Express uses the occurrence number to ensure that duplicate key records are read in the order in which they were created. To achieve this, you cannot reuse an occurrence number whose record you have deleted. Therefore, you can reach the maximum number of duplicate values, even if some of those keys have already been deleted.

Some types of indexed file contain a duplicate occurrence record in the data file. For more information, see the Net Express online help. (Click Help Topics on the Help menu. Then, on the Index tab, double-click Indexed file,Types.)

Where an indexed file contains a duplicate occurrence record, each record in the data file is followed by a system record. This system record holds, for each duplicate key in that record, the occurrence number of the key. This number is just a counter of the number of times that key value has been used during the history of the file. The presence of the duplicate occurrence record makes REWRITE and DELETE operations on a record with many duplicates much faster but causes the data records of such files to be larger than those of a standard file.

To enable duplicate values to be specified for alternate keys, use WITH DUPLICATES in the ALTERNATE RECORD KEY clause in the SELECT statement:

file-control.
select idxfile assign to "idx.dat"
   organization is indexed
   record key is idxfile-record-key
   alternate record key is idxfile-alt-key with duplicates.

2.3.4 Sparse Keys

A sparse key is a key for which no index entry is stored for a given value of that key. For example, if a key is defined as sparse when it contains all spaces, index entries for the key are not included when the part of the record it occupies contains only space characters.

Only alternate keys can be sparse.

Using this feature results in smaller index files. The larger your key(s) and the more records you have for which the alternate key has the given value, the larger your saving of disk space.

To enable sparse keys, use SUPPRESS WHEN ALL in the ALTERNATE RECORD KEY clause in the SELECT statement:

file-control.
select idxfile assign to "idx.dat"
   organization is indexed
   record key is idxfile-record-key
   alternate record key is idxfile-alt-key with duplicates suppress when all "A".

In this example, if a record is written for which the value of the alternate key is all A's, the actual key value is not stored in the index file.

2.3.5 Indexed File Access

You can use both the primary and alternate keys to read records from an indexed file, either directly (random access) or in key sequence (sequential access). The access mode can be:

SEQUENTIAL
Access records in order of ascending or descending record key value (default)
RANDOM
Access records according to the value of the record key
DYNAMIC
Use the appropriate forms of I/O statement to switch between sequential and random access

The method of accessing an indexed file is defined using the ACCESS MODE IS clause in the SELECT statement, for example:

file-control.
select idxfile assign to "idx.dat"
   organization is indexed
   access mode is dynamic
   record key is idxfile-record-key
   alternate record key is idxfile-alt-key.

2.4 Fixed and Variable Length Records

A file can contain:

Fixed length records - all the records are exactly the same length
Variable length records - the length of each record varies

Using variable length records may enable you to save disk space. For example, if your application generates many short records with occasional long ones and you use fixed length records, you need to make the fixed record length equal to the length of the longest record. This wastes a lot of disk space, so using variable length records would be a great advantage.

The type of record is determined as follows:

To use:	Specify the clause:
variable length records	RECORDING MODE IS V
fixed length records	RECORDING MODE IS F

Otherwise:

To use:	Specify the clause:
variable length records	RECORD IS VARYING
fixed length records	RECORD CONTAINS n CHARACTERS

Otherwise:

To use:	Specify the Compiler directive:
variable length records	RECMODE"V"
fixed length records	RECMODE"F"

Otherwise, to use variable length records, specify the RECMODE"OSVS" Compiler directive plus one of the following:

RECORD CONTAINS n TO m CHARACTERS
More than one record area of different lengths

2.5 File Headers

A file header is a block of 128 bytes at the start of the file. Files with the following file organizations contain file headers:

Indexed files
Record sequential files with variable length records
Relative files with variable length records

In addition, each record in these files is preceded by a 2 or 4 byte record header.

Further detail on file and record headers and the structure of files with headers is available in the Net Express online help file. (Click Help Topics on the Help menu. Then, on the Index tab, double-click Structure, files with headers.)

Introduction

Filenames