PreviousUsing File Status Sharing FilesNext"

Chapter 5: Sorting Files

You can use several methods to sort your files:

This chapter describes the command line interface, Mfsort.

Mfsort is provided as mfsort.exe, on DOS, Windows and OS/2, and mfsort on UNIX, and is always invoked from the operating system command line.

5.1 Invoking Mfsort

Mfsort is invoked from the command line in one of the following ways:

mfsort [instructions]
mfsort take filename

where the parameters are:

instructions

Instruction statements as defined in the section Instruction Statements below.

When specifying instructions on the command line, remember to:

  • Observe the maximum command line length imposed by the operating system.

  • Allow the shell to pass through special characters such as parentheses or comments in one of the following ways:

    • Use quotes "" to enclose parentheses.

    • For parentheses, place an escape character \ directly in front of each parenthesis. For a comment, place an escape character \ directly in front of *.

take filename Mfsort reads instruction statements from the text file filename. Use this method if you need to specify a lot of parameters.

There are examples of invoking Mfsort later in this chapter.

5.2 Sort Workfile

When a file is sorted, all the records are copied to a logical sort workfile from each of the input files, truncated or padded as appropriate. The logical sort workfile is then sorted according to its key description.

After being sorted in the logical sort workfile, the records are copied to each of the output files, truncated or padded as appropriate. If you do not want any of the records to be truncated, you must ensure that the record length of the logical sort workfile is sufficient for the longest record to be sorted.

DOS, Windows and OS/2:
On DOS, Windows and OS/2, the sort workfile is paged to disk in the current directory or the directory specified by the TMP environment variable, if it is set.

UNIX:
On UNIX, the sort workfile is paged to disk in the directory /usr/tmp or the directory specified by the TMPDIR environment variable, if it is set.

If your input files are of variable-length and the sort workfile is not, all concept of variable-length is lost.

If your input file is fixed-length and the sort workfile is variable-length, the record length of the record in the sort workfile is either the fixed length of the input file or the maximum record length of the sort workfile, whichever is the smaller.

5.3 Instruction Statements

Instruction statements specify how Mfsort sorts or merges a file.

The following list gives all valid instruction statement syntax.

Instruction
Meaning
* The rest of the line is treated as a comment.
TAKE filename Reads instructions from filename. The name can be fully qualified.
CHAR-EBCDIC Specifies that the sort or merge should assume EBCDIC data. By default, Mfsort assumes ASCII data. CHAR-EBCDIC must precede all USE, GIVE, SORT, or MERGE instructions.
SIGN-EBCDIC Specifies that the sort or merge should assume that numeric DISPLAY items with included signs are interpreted according to the EBCDIC convention. SIGN-EBCDIC is not used when CHAR-EBCDIC is specified. It is used for data that is otherwise in ASCII. This is usually needed where the program which created the data was compiled with the SIGN"EBCDIC" directive. SIGN-EBCDIC must precede all USE, GIVE, SORT, or MERGE instructions.
SORT Specifies a sort operation and must be followed by a FIELDS statement, and optionally a RECORD statement. See the sections Sort Fields and RECORD Statement.
MERGE Specifies a merge operation and must be followed by a FIELDS statement, and optionally a RECORD statement. See the sections Sort Fields and RECORD Statement.

Note: SORT and MERGE are mutually exclusive.


FIELDS The fields on which the file is to be sorted. See the section Sort Fields.
RECORD The work file record size and format. See the section Record Statement.
USE filename An input file. You must specify all USE statements before any GIVE statements.
GIVE filename An output file.

The following statements are used to define each file, and should be placed immediately after the associated USE or GIVE statement:

RECORD The record size and format. See the section RECORD Statement.
ORG type The file type; one of:

IX indexed sequential
RL relative
SQ sequential
LS line sequential

If you do not specify ORG, the default file type is SQ.

KEY Specifies the key or keys to be used. See the section KEY Statement.

If you omit any of the above statements, the most recently specified values are used. Therefore, if the input and output files are all of the same type and format, you need define only the first file.

If the instruction syntax you specify is invalid, Mfsort sets the return code to non-zero, shows the invalid syntax, and stops.

5.3.1 Sort Fields

Sort fields are entered using the FIELDS statement:

fields ({start, length, type, order} ...)

where the parameters are:

start Starting position of the sort field, in bytes.
length Number of bytes in the sort field.
type Type of data in the SORT field (see the section Field Types).
order Ordering of output, which can be either of:

A ascending
D descending

Up to 16 fields can be specified by repeating the parameter set start, length, type, and order. The sets are separated by commas.

Field Types

Valid field types are listed below.

Type
Definition
CH PIC X DISPLAY
NU PIC 9 DISPLAY
PD PIC S9 COMP-3
FI PIC s9.99 DISPLAY
BI COMP
C5 COMP-5
C6 COMP-6
C5 S9 COMP-5
CX COMP-X
LS PIC S9 LEADING SEPARATE
TS PIC S9 TRAILING SEPARATE
LI PIC S9 LEADING INCLUDED
TI PIC S9 TRAILING INCLUDED (compiled SIGN"EBCDIC")

Consider the following example:

mfsort sort fields (2, 6, ch, a, 10, 4, bi, d)

This sorts the file on a 6-byte alphanumeric field starting at the second byte of the record in ascending order, with a secondary ordering on a 4-byte (PIC 9(8))COMP field starting at the tenth byte of the record, in descending order.

5.3.2 RECORD Statement

The RECORD statement is optional; it specifies the format of records in the sort workfile:

record format,rec-len,max-len

where format can be:

F Fixed length record of length rec-len
V Variable length record with minimum length rec-len and maximum length max-len

If you omit this option, the sort workfile defaults to fixed format records, with the record size equal to the largest record specified in the USE or GIVE instructions.

You can specify data compression for the sort workfile as follows:

FCnnn Fixed length record of length rec-len, with compression, using compression routine nnn.
VCnnn Variable length record with minimum length rec-len and maximum length max-len, with compression, using compression routine nnn.

5.3.3 KEY Statement

The format of the KEY statement is:

key start,length,ixkey,...

The KEY statement specifies the key structure for the file and is only required for indexed files; start and length are as defined for the FIELDS statement, while ixkey is one of:

P Primary key (must always be defined first)
A Alternate key
AD Alternate key with duplicates
C Component of the last-specified primary or alternate key.

The KEY statement should be repeated as required to describe the entire key structure, separated by commas and terminated by a right parenthesis.

Additionally, you must define the keys in order of importance. The primary key first, followed by all its components if it is split, then the first alternate key and all its components, and so on.

Example

The following example defines three keys:

key (4,5,p,10,5,c,20,2,ad,40,2,a,46,10,c)

The first key, the primary key is split with its first component starting at character position 4, length 5 and its second component starting at position 10, length 5. The second key, the alternate key enables duplicates, starts at 20, length 2. The third key is a split alternate key, with the first component starting at 40, length 2, and the second component starting at 46, length 10.

5.4 Examples

To sort four files (file1 through file4), invoke Mfsort with the following command line:

mfsort use file1 record "(v,4,8)" use file2 use file3 
    use file4 give file5
        sort fields "(1, 1, c5, a, 2, 1, c5, d)"

To merge the four files as in the following COBOL program:

 01 merge-file-rec1.
     03 merge-key1      pic s99 comp-5.
     03 second-key1 pic 99 comp-5.
     03 merge-data      pic 99.
   . . .   
     merge merge-file1
         on ascending key merge-key1
         on descending key second-key1
         using file1, file2, file3, file4
         giving file5

invoke Mfsort as follows:

mfsort take sample.srt

where sample.srt contains the following statements:

*SAMPLE.SRT 
use  file1  record (v,4,8)
use  file2
use  file3
use  file4
give  file5
merge  fields (1,1,c5,a,2,1,c5,d)

Note that ORG is not needed here as SQ is the default, and that file2 through file5 and the merge workfile use the record format specified for file1.

5.5 Mfsort Error Messages

Instructions are needed
SORT or MERGE already specified
Failed to open filename
Failed to read filename
Invalid or illegal syntax
Unsupported syntax
Unable to get enough memory, aborting
Too many input files
Too many output files specified
All input files must be specified before specifying output files
Record format not specified
SORT failure, file status code: status
Please specify EBCDIC before other arguments
SIGN-EBCDIC incompatible with CHAR-EBCDIC
Prime key not specified first
Key description missing for ISAM file


Copyright © 1999 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names used herein are protected by international law.

PreviousUsing File Status Sharing FilesNext"