Btrieve

Overview of Working with Data Files

Chapter 9: Mfsort Utility

This COBOL system provides the following methods of sorting and merging files:

Sorting Method	Description
The run-time system COBOL sort module	The default module that executes a SORT statement in your COBOL program. It can also be called directly using the CALL statement. For details, see the section The Callable Sort Module in the chapter *File Handler and Callable Sort APIs*.
mfsort utility	A utility, which you can invoke from the command line, that enables you to sort and merge data files.

This chapter descibes how to use the mfsort facility.

9.1 Emulation of Dfsort

Mfsort enables you to sort and merge data files. It almost completely emulates IBM's Dfsort product, Release 14 and includes support for:

INCLUDE/OMIT - Inclusion or omission of selected records
SUM - Producing sums on selected fields for records with duplicate key values
OUTREC - Output record editing
OUTFIL - Complex editing and reporting with multiple outputs
Y2K ordering with various kinds of Y2K date field
Expansion of two digit year fields to 4 digits (using OUTFIL)

Details of these functions can be found on IBM's Dfsort websites which can be reached from the DFSORT home page: IBM DFSORT/MVS Overview.

9.2 Invoking Mfsort

The mfsort utility is provided as the file mfsort.exe.

You can invoke mfsort from the Net Express Command Prompt in one of the following ways:

mfsort instructions

mfsort take filename

where the parameters are:

Parameter	Description
instructions	Mfsort instructions. See the section Instructions. When specifying instructions on the command line, remember to observe the maximum command line length imposed by the operating system.
filename	A text file containing mfsort instructions. See the section Instructions. Use this method if you need to specify a lot of instructions.

9.2.1 Instructions

The following is a list of valid mfsort instructions:

Instruction	Meaning
*	The rest of the line is treated as a comment. This is useful if you are supplying instructions via a text file as you can add comments to the file which explain the purpose of each instruction.
CHAR-EBCDIC	EBCDIC data. CHAR-EBCDIC must precede all SORT, MERGE, USE or GIVE instructions.
SIGN-EBCDIC	Numeric DISPLAY items with included signs are interpreted according to the EBCDIC convention. SIGN-EBCDIC is not required when CHAR-EBCDIC is specified but for data that is otherwise ASCII, such as when the program which created the data was compiled with the SIGN"EBCDIC" Compiler directive. SIGN-EBCDIC must precede all SORT, MERGE, USE or GIVE instructions.
SORT/MERGE	These instructions specify either a sort or a merge option and must be followed by a FIELDS instruction specifying the field(s) to be used. The FIELDS instruction may optionally be followed by a RECORD instruction specifying the record size and format of the workfile. SORT and MERGE are mutually exclusive.
FIELDS (instructions)	The fields on which the file is to be sorted or merged. See the section Fields Instruction.
RECORD definition	Record size and format. A RECORD instruction can be used to specify these details for the workfile, input file(s) and output file(s). See the section RECORD Instruction.
USE input-file	Each USE instruction specifies an input file. You must specify all USE instructions before any GIVE instructions. See the section Defining Input and Output Files.
GIVE output-file	Each GIVE instruction specifies an output file. See the section Defining Input and Output Files.
INCLUDE/OMIT	Specifies conditions in which records will be included or omitted from the sort process. For details, see the IBM documentation to be found at Using DFSORT Program Control Statements. INCLUDE and OMIT are mutually exclusive.
INREC	Reformats records before the SORT/MERGE process.
OUTREC	Reformats records following the SORT/MERGE process.
MODS	Specifies external procedures (user exits) that are executed, each time a record is released to or returned from the SORT/MERGE process. This implementation supports the E15 and E35 user exits.
SUM	Specifies that records with the same key value are returned as a single record. Optionally, a field may be specified to accumulate totals for all records with equal keys.
OUTFIL	This is used to specify complex editing and reporting to one or more output files. Each output file should be specified using a GIVE command. Otherwise, OUTFIL works as described in the IBM documentation to be found at Using DFSORT Program Control Statements.
OPTION	This can be used to specify various options. One of these options is COPY which results in records being copied, rather than sorted, to the output file.

9.2.1.1 FIELDS Instruction

A SORT or MERGE instruction must be followed by a FIELDS instruction which specifies the fields on which the input file is to be sorted or merged.

A fields instruction takes the following form:

fields({start,length,type,order},...)

where the parameters are:

Parameter	Description
start	The starting position of the field in the record, counting in bytes from 1
length	The length of the field (bytes)
type	The type of data in the field. See the section Field Types .
order	The ordering of output, which can be either of: A - ascending D - descending

You can specify up to 16 fields by repeating the parameter set (start, length, type and order). Use commas to separate the parameters and the parameter sets.

9.2.1.1.1 Field Types

The following is a list of some of the available field types:

Field Type	Definition
AQ	Character with alternate collating sequence.
BI	COMP
C5	COMP-5
C6	COMP-6
CH	PIC X DISPLAY
CX	COMP-X
FL	Floating point, signed.
FS/CSF	Signed numeric, with optional leading floating sign.
LI/OL/CLO	PIC S9 LEADING INCLUDED
LS/CSL	PIC S9 LEADING SEPARATE
NU	PIC 9 DISPLAY
PD	PIC S9 COMP-3
PD0	Packed decimal with first semi-byte and sign semi-byte ignored.
SB/FI	PIC S9 COMP
S5	S9 COMP-5
SS	Substring. Used in conditions only.
TS/CST	PIC S9 TRAILING SEPARATE
TI/ZD/OT/CTO	PIC S9 TRAILING INCLUDED
Y2B	Two-digit, one-byte binary year data.
Y2C/Y2Z	Two-digit, two-byte year data, with optional trailing included sign. PIC 99 or PIC S99.
Y2D	Two-digit, one-byte packed decimal year data. PIC 99 COMP-6.
Y2P	Two-digit, two-byte packed decimal year data. PIC 99 COMP-3.
Y2S	Two-digit, two-byte character year data with special indicators. Binary zeros, blanks and binary ones are treated as special cases.
Y2T	Full date format, yyx...
Y2U	Full date format, yyx..., COMP-3.
Y2V	Full date format, yyx..., COMP-3. Ignores first semi-byte.
Y2W	Full date format, x...yy.
Y2X	Full date format, x...yy, COMP-3.
Y2Y	Full date format, x...yy, COMP-3. Ignores first semi-byte.

You can find other field types defined in the IBM documentation at SORT Control Statement.

Suppose that golf.dat is a relative file defined in a COBOL program as follows:

file-control.
select members-file
   assign to "d:\netexpress\base\workarea\golf.dat"
   organization is relative
   access mode is random
   relative key is relative-key.
data division.
file section.
fd members-file
   record contains 28 characters.
01 members-record.
   03 members-number pic 9(6).
   03 members-lname pic x(10).
   03 members-fname pic x(10).
   03 members-handicap pic 9(2).

You can then use the following mfsort command to sort the file golf.dat on the field containing the membership number in ascending order:

mfsort sort fields(1,6,nu,a)
   use golf.dat record f,28 org rl
   give members.dat

The sorted version of the file is written to the file members.dat.

9.2.1.2 Defining Input and Output Files

You need to give instructions to define both the input and output files:

File	Instructions
Input	USE input-file [record definition] [org organization] [key structure]
Output	GIVE output-file [record definition] [org organization] [key structure]

Notes:

For input files that are either variable length or indexed files, you do not need to specify RECORD, ORG or KEYcommands as the file characteristics can be deduced from the file itself.
If you omit any of the above instructions, the last specified values are used. Therefore, you need only specify values for the first file if the input and output files are all of the same type and format.
If the first file is variable length or indexed, you do not need to specify any of the values.

9.2.1.2.1 RECORD Instruction

You use the RECORD instruction to specify the format and length of records in the:

Sort work file
Input and output files when the instruction follows the associated USE or GIVE instruction

The RECORD instruction takes the following form:

RECORD format,rec-len,max-len

where the parameters are:

Parameter	Description
format	The record format, one of: F - fixed length records of length rec-len V - variable length records with a minimum length of rec-len and a maximum length of max-len
rec-len	If format is set to F, the record length If format is set to V, the minimum record length
max-len	If format is set to V, the maximum record length

Parameter

Description

format

The record format, one of:

F - fixed length records of length rec-len

V - variable length records with a minimum length of rec-len and a maximum length of max-len

rec-len

If format is set to F, the record length

If format is set to V, the minimum record length

max-len

If format is set to V, the maximum record length

If you do not specify a RECORD instruction for the sort workfile, the format defaults to fixed record format, with the record size equal to the largest record specified in the USE or GIVE instructions.

You do not need to specify a RECORD instruction for input files that are either variable length or indexed files as the file characteristics can be deduced from the file itself.

9.2.1.2.2 ORG Instruction

The ORG instruction specifies the file organization, and can be one of:

ORG Instruction	File Organization
IX	indexed
RL	relative
SQ	sequential (default value)
LS	line sequential

You do not need to specify an ORG instruction for input files that are either variable length or indexed files as the file characteristics can be deduced from the file itself.

9.2.1.2.3 KEY Instruction

The KEY instruction specifies the key structure for an indexed file. It is used when an output file is indexed and its key structure is not the same as that of the indexed input file.

The format of the KEY instruction is:

KEY ({start,length,ixkey},...)

where the parameters are:

Parameter	Description
start	The starting position of the key in a record, counting in bytes from 1
length	The number of bytes in the key
ixkey	One of: P - Primary key (this must always be defined first) A - Alternate key AD - Alternate key with duplicates C - Component of the last-specified primary or alternate key

Parameter

Description

start

The starting position of the key in a record, counting in bytes from 1

length

The number of bytes in the key

ixkey

One of:

P - Primary key (this must always be defined first)

A - Alternate key

AD - Alternate key with duplicates

C - Component of the last-specified primary or alternate key

You can repeat the KEY instruction as often as required to describe the entire key structure. Use commas to separate the parameters and parameter sets (start, length, ixkey).

You must define the keys in order of importance with the primary key first, followed by all its components if it is split, then the first alternate key and all of its components and so on.

The following example defines three keys:

KEY (4,5,p,10,5,c,20,2,ad,40,2,a,46,10,c)

where:

4,5,p,10,5,c represents the first primary key which is split. Its first component starts at character position 4 with a length of 5 bytes and its second component starts at character position 10 with a length of 5 bytes.
20,2,ad represents the second (alternate) key which can have duplicates and starts at character position 20 with a length of 2 bytes
40,2,a,46,10,c represents the third key. This is a split alternate key, with the first component starting at character position 40 with a length of 2 bytes and the second component starting at character position 46 with a length of 10 bytes.

9.3 Example Commands

This section gives some examples of mfsort commands and jobstreams.

You can find other examples at the IBM document page, Examples of DFSORT Job Streams.

9.3.1 Sorting Using More Than One File

Imagine four indexed files (north.dat, south.dat, east.dat and west.dat) which contain for the north, south, east and west of the country the scores achieved by members of a national organisation in a national competition. The COBOL syntax used to define north.dat is shown below:

file-control.
   select idxfile assign to "north.dat"
      organization is indexed
      record key is member-id.
data division.
file section.
fd idxfile
record contains 39 characters.
01 idxfile-record.
   03 member-id  pic 9(6).
   03 surname    pic x(15).
   03 first-name pic x(15).
   03 score      pic 9(3).

Each of the other files has been created in the same way and the results of the competition have been entered in the files. The following examples use these files.

9.3.1.1 Character Sort in Ascending Order

The following mfsort commands takes all of the records from each of the four files, sorts them on the member's surname in ascending order and outputs the result to the relative file members.dat:

mfsort sort fields(7,15,ch,a)
   use north.dat 
   use south.dat
   use east.dat
   use west.dat
   give members.dat org rl

9.3.1.2 Numeric Sort in Descending Order

The following mfsort command takes each of the four files, sorts them on the member's score (highest score first) and outputs the result to the relative file scores.dat:

mfsort sort fields(37,3,nu,d)
   use north.dat 
   use south.dat
   use east.dat
   use west.dat
   give scores.dat org rl

9.3.1.3 Omitting Records

The following mfsort command takes each of the four files, sorts them on the membership number (which is the primary key) and outputs the result to the indexed file national.dat. All records for which the score field is less than 20 are omitted:

mfsort sort fields(1,6,nu,a)
   use north.dat 
   use south.dat
   use east.dat
   use west.dat
   give national.dat
   omit cond (37,3,nu,lt,20)

9.3.2 Single File Sort Using INCLUDE and a Sub-string Comparison

The following mfsort command takes a line sequential file, sortin.dat and sorts its records on a character field starting at position 11 with a length of 4 bytes. The results are output to the file sortout.dat which will include only records for which the sub-string, starting at position 15 of length 3 bytes, is equal to any three consecutive characters in the string 'J69,L92,J82'.

mfsort sort fields=(11,4,ch,a)
   use sortin.dat org ls record (f 80)
   give sortout.dat
   include cond=(15,3,ss,eq,c'J69,L92,J82')

9.3.3 Transforming Records Using OUTREC

The following mfsort command transforms records containing a field of format cyymmdd to the format yyymmdd.

 Sort C'cyymmdd' 
      SORT FIELDS=(1,7,BI,A)          * sort C'cyymmdd' 
      use mfs110a.in org ls record (f 40)
     * Transform C'cyymmdd' to C'yyyymmdd' 
      OUTFIL OUTREC=(1,1,CHANGE=(2,   * change C'c' as follows: 
                         C'0',C'19',  *   C'0' to C'19' 
                         C'1',C'20',  *   C'1' to C'20' 
                         C'2',C'21'), *   C'2' to C'21' 
                         NOMATCH=(C'99')
                    2,6)              * copy C'yymmdd' 


      give sortout.dat

9.3.4 Sort Using OUTFIL for Complex Reporting

The following is an example of how to use the OUTFIL command to produce a complex report, in this case a Profit and Loss report for one of four divisions. The input file, mfs121a.dat is sorted on the first two fields and only records from the western region are output. The SECTIONS instruction produces a page throw when the field starting in position 3, length 10 bytes changes. The following shows:

The input data
The mfsort command
The resultant output

9.3.4.1 Input data

  Chips        San Martin     0088902203 West  
  Chips        Oakland        0023412432 West  
  Chips        San Jose       0123213335 West  
  Ice Cream    Marin          0054234123 West  
  Chips        Gilroy         0055484342 West  
  Ice Cream    Napa           0085734283 West  
  Pretzels     San Jose       0123488534 West  
  Ice Cream    San Francisco  0092231245 West  
  Chips        San Francisco  000324343q West  
  Chips        San Jose       0123213335 South 
  Ice Cream    San Martin     0100346730 West  
  Pretzels     Marin          0534332344 West  
  Chips        Gilroy         0055484342 South 
  Chips        Morgan Hill    0098732232 West  
  Pretzels     Morgan Hill    0084384340 West  
  Ice Cream    San Jose       000002345u West  
  Pretzels     Napa           0531234856 West  
  Chips        Oakland        0023412432 South 
  Pretzels     San Martin     000023438r West  
  Chips        Los Angeles    000223401t West  
  Ice Cream    Marin          0054234123 South 
  Pretzels     San Francisco  0541230005 West  
  Ice Cream    Napa           0085734283 South 
  Pretzels     San Jose       0123488534 South 
  Ice Cream    San Francisco  0092231245 South 
  Chips        San Francisco  000324343q South 
  Ice Cream    San Martin     0100346730 South 
  Pretzels     Marin          0534332344 South 
  Chips        Morgan Hill    0098732232 South 
  Pretzels     Morgan Hill    0084384340 South 
  Ice Cream    San Jose       000002345u South 
  Pretzels     Napa           0531234856 South 
  Pretzels     San Martin     000023438r South 
  Chips        Los Angeles    0002234014 South 
  Pretzels     San Francisco  0541230005 South

9.3.4.2 Mfsort Command

SORT FIELDS=(3,10,A,16,13,A),FORMAT=CH

     use mfs121a.dat  org ls record (f 80)
    OUTFIL
      INCLUDE=(42,6,CH,EQ,C'West'),
      HEADER1=(5/,18:'    Western Region',3/,
                  18:'Profit and Loss Report',3/,
                  18:'     for  ',&DATE,3/,
                  18:'      Page',&PAGE),
      OUTREC=(6:16,13,24:31,10,ZD,M5,LENGTH=20,75:X),
      SECTIONS=(3,10,SKIP=P,
        HEADER3=(2:'Division:  ',3,10,5X,'Page:',&PAGE,2/,
                 6:'Branch Office',24:'       Profit/(Loss)',/,
                 6:'-------------',24:'--------------------'),
       TRAILER3=(6:'=============',24:'====================',/,
                 6:'Total',24:TOTAL=(31,10,ZD,M5,LENGTH=20),/,
                 6:'Lowest',24:MIN=(31,10,ZD,M5,LENGTH=20),/,
                 6:'Highest',24:MAX=(31,10,ZD,M5,LENGTH=20),/,
                 6:'Average',24:AVG=(31,10,ZD,M5,LENGTH=20),/,
                 3/,2:'Average for all Branch Offices so far:',
                    X,SUBAVG=(31,10,ZD,M5))),
        TRAILER1=(8:'Page ',&PAGE,5X,'Date:  ',&DATE,5/,
                  8:'Total Number of Branch Offices Reporting:  ',
                    COUNT,2/,
                  8:'Summary of Profit/(Loss) for all',
                    ' Western Division Branch Offices',2/,
                  12:'Total:',
                      22:TOTAL=(31,10,ZD,M5,LENGTH=20),/,
                  12:'Lowest:',
                      22:MIN=(31,10,ZD,M5,LENGTH=20),/,
                  12:'Highest:',
                      22:MAX=(31,10,ZD,M5,LENGTH=20),/,
                  12:'Average:',
                      22:AVG=(31,10,ZD,M5,LENGTH=20))
        give outfil1.dat

9.3.4.3 Output





                     Western Region


                 Profit and Loss Report


                      for  11/05/95


                       Page     1
**************************************************************************
  Division:  Chips          Page:     2

     Branch Office            Profit/(Loss)
     -------------     --------------------
     Gilroy                     554,843.42
     Los Angeles                (22,340.14)
     Morgan Hill                987,322.32
     Oakland                    234,124.32
     San Francisco              (32,434.31)
     San Jose                 1,232,133.35
     San Martin                 889,022.03
     =============     ====================
     Total                    3,842,670.99
     Lowest                     (32,434.31)
     Highest                  1,232,133.35
     Average                    548,952.99



 Average for all Branch Offices so far:     548,952.99
**************************************************************************
  Division:  Ice Cream      Page:     3

     Branch Office            Profit/(Loss)
     -------------     --------------------
     Marin                      542,341.23
     Napa                       857,342.83
     San Francisco              922,312.45
     San Jose                      (234.55)
     San Martin               1,003,467.30
     =============     ====================
     Total                    3,325,229.26
     Lowest                        (234.55)
     Highest                  1,003,467.30
     Average                    665,045.85



 Average for all Branch Offices so far:     597,325.02
**************************************************************************
  Division:  Pretzels       Page:     4

     Branch Office            Profit/(Loss)
     -------------     --------------------
     Marin                    5,343,323.44
     Morgan Hill                843,843.40
     Napa                     5,312,348.56
     San Francisco            5,412,300.05
     San Jose                 1,234,885.34
     San Martin                  (2,343.82)
     =============     ====================
     Total                   18,144,356.97
     Lowest                      (2,343.82)
     Highest                  5,412,300.05
     Average                  3,024,059.49



 Average for all Branch Offices so far:   1,406,236.51
**************************************************************************
        Page      5     Date:  11/05/95




       Total Number of Branch Offices Reporting:        18

       Summary of Profit/(Loss) for all Western Division Branch Offices

           Total:          25,312,257.22
           Lowest:            (32,434.31)
           Highest:         5,412,300.05
           Average:         1,406,236.51

9.4 Workfile

During a sort or merge operation, mfsort uses a temporary workfile. This workfile is paged to disk in the current directory or, if it is set, in the directory specified by the TMP environment variable.

Mfsort copies all the records from each of the input files to the temporary workfile, truncated or padded as appropriate. The workfile is then sorted or merged according to its key description. After being sorted or merged in the workfile, the records are copied to each of the output files and truncated or padded as appropriate.

During this operation:

If you do not need any of the records to be truncated, you must ensure that the record length of the workfile is sufficient for the longest record to be sorted
If your input files are of variable-length and the sort workfile is not, all concept of variable-length is lost
If your input file is fixed-length and the workfile is variable-length, the record length of the record in the workfile is either the fixed length of the input file or the maximum record length of the workfile, whichever is the smaller

9.5 Error Messages

A full list of mfsort error messages is given in the Net Express online help. (Click Help Topics on the Help menu. Then, on the Index tab, double-click Mfsort, error messages.)

Btrieve

Overview of Working with Data Files