PreviousBtrieve Overview of Working with Data FilesNext

Chapter 9: Mfsort Utility

This COBOL system provides the following methods of sorting and merging files:

Sorting Method
Description
The run-time system COBOL sort module The default module that executes a SORT statement in your COBOL program. It can also be called directly using the CALL statement. For details, see the section The Callable Sort Module in the chapter File Handler and Callable Sort APIs.
mfsort utility A utility, which you can invoke from the command line, that enables you to sort and merge data files.

This chapter descibes how to use the mfsort facility.

9.1 Emulation of Dfsort

Mfsort enables you to sort and merge data files. It almost completely emulates IBM's Dfsort product, Release 14 and includes support for:

Details of these functions can be found on IBM's Dfsort websites which can be reached from the DFSORT home page: IBM DFSORT/MVS Overview.

9.2 Invoking Mfsort

The mfsort utility is provided as the file mfsort.exe.

You can invoke mfsort from the Net Express Command Prompt in one of the following ways:

mfsort instructions
mfsort take filename

where the parameters are:

Parameter
Description
instructions Mfsort instructions. See the section Instructions. When specifying instructions on the command line, remember to observe the maximum command line length imposed by the operating system.
filename A text file containing mfsort instructions. See the section Instructions. Use this method if you need to specify a lot of instructions.

9.2.1 Instructions

The following is a list of valid mfsort instructions:

Instruction
Meaning
* The rest of the line is treated as a comment. This is useful if you are supplying instructions via a text file as you can add comments to the file which explain the purpose of each instruction.
CHAR-EBCDIC EBCDIC data. CHAR-EBCDIC must precede all SORT, MERGE, USE or GIVE instructions.
SIGN-EBCDIC Numeric DISPLAY items with included signs are interpreted according to the EBCDIC convention. SIGN-EBCDIC is not required when CHAR-EBCDIC is specified but for data that is otherwise ASCII, such as when the program which created the data was compiled with the SIGN"EBCDIC" Compiler directive. SIGN-EBCDIC must precede all SORT, MERGE, USE or GIVE instructions.
SORT/MERGE These instructions specify either a sort or a merge option and must be followed by a FIELDS instruction specifying the field(s) to be used. The FIELDS instruction may optionally be followed by a RECORD instruction specifying the record size and format of the workfile. SORT and MERGE are mutually exclusive.
FIELDS (instructions) The fields on which the file is to be sorted or merged. See the section Fields Instruction.
RECORD definition Record size and format. A RECORD instruction can be used to specify these details for the workfile, input file(s) and output file(s). See the section RECORD Instruction.
USE input-file Each USE instruction specifies an input file. You must specify all USE instructions before any GIVE instructions. See the section Defining Input and Output Files.
GIVE output-file Each GIVE instruction specifies an output file. See the section Defining Input and Output Files.
INCLUDE/OMIT Specifies conditions in which records will be included or omitted from the sort process. For details, see the IBM documentation to be found at Using DFSORT Program Control Statements. INCLUDE and OMIT are mutually exclusive.
INREC Reformats records before the SORT/MERGE process.
OUTREC Reformats records following the SORT/MERGE process.
MODS Specifies external procedures (user exits) that are executed, each time a record is released to or returned from the SORT/MERGE process. This implementation supports the E15 and E35 user exits.
SUM Specifies that records with the same key value are returned as a single record. Optionally, a field may be specified to accumulate totals for all records with equal keys.
OUTFIL This is used to specify complex editing and reporting to one or more output files. Each output file should be specified using a GIVE command. Otherwise, OUTFIL works as described in the IBM documentation to be found at Using DFSORT Program Control Statements.
OPTION This can be used to specify various options. One of these options is COPY which results in records being copied, rather than sorted, to the output file.

9.2.1.1 FIELDS Instruction

A SORT or MERGE instruction must be followed by a FIELDS instruction which specifies the fields on which the input file is to be sorted or merged.

A fields instruction takes the following form:

fields({start,length,type,order},...)

where the parameters are:

Parameter
Description
start The starting position of the field in the record, counting in bytes from 1
length The length of the field (bytes)
type The type of data in the field. See the section Field Types .
order The ordering of output, which can be either of:
A - ascending
D - descending

You can specify up to 16 fields by repeating the parameter set (start, length, type and order). Use commas to separate the parameters and the parameter sets.

9.2.1.1.1 Field Types

The following is a list of some of the available field types:

Field Type
Definition
AQ Character with alternate collating sequence.
BI COMP
C5 COMP-5
C6 COMP-6
CH PIC X DISPLAY
CX COMP-X
FL Floating point, signed.
FS/CSF Signed numeric, with optional leading floating sign.
LI/OL/CLO PIC S9 LEADING INCLUDED
LS/CSL PIC S9 LEADING SEPARATE
NU PIC 9 DISPLAY
PD PIC S9 COMP-3
PD0 Packed decimal with first semi-byte and sign semi-byte ignored.
SB/FI PIC S9 COMP
S5 S9 COMP-5
SS Substring. Used in conditions only.
TS/CST PIC S9 TRAILING SEPARATE
TI/ZD/OT/CTO PIC S9 TRAILING INCLUDED
Y2B Two-digit, one-byte binary year data.
Y2C/Y2Z Two-digit, two-byte year data, with optional trailing included sign. PIC 99 or PIC S99.
Y2D Two-digit, one-byte packed decimal year data. PIC 99 COMP-6.
Y2P Two-digit, two-byte packed decimal year data. PIC 99 COMP-3.
Y2S Two-digit, two-byte character year data with special indicators. Binary zeros, blanks and binary ones are treated as special cases.
Y2T Full date format, yyx...
Y2U Full date format, yyx..., COMP-3.
Y2V Full date format, yyx..., COMP-3. Ignores first semi-byte.
Y2W Full date format, x...yy.
Y2X Full date format, x...yy, COMP-3.
Y2Y Full date format, x...yy, COMP-3. Ignores first semi-byte.

You can find other field types defined in the IBM documentation at SORT Control Statement.

Suppose that golf.dat is a relative file defined in a COBOL program as follows:

file-control.
select members-file
   assign to "d:\netexpress\base\workarea\golf.dat"
   organization is relative
   access mode is random
   relative key is relative-key.
data division.
file section.
fd members-file
   record contains 28 characters.
01 members-record.
   03 members-number pic 9(6).
   03 members-lname pic x(10).
   03 members-fname pic x(10).
   03 members-handicap pic 9(2).

You can then use the following mfsort command to sort the file golf.dat on the field containing the membership number in ascending order:

mfsort sort fields(1,6,nu,a)
   use golf.dat record f,28 org rl
   give members.dat

The sorted version of the file is written to the file members.dat.

9.2.1.2 Defining Input and Output Files

You need to give instructions to define both the input and output files:

File
Instructions
Input
USE input-file [record definition]
   [org organization] [key structure]
Output
GIVE output-file [record definition] 
   [org organization] [key structure]

Notes:

9.2.1.2.1 RECORD Instruction

You use the RECORD instruction to specify the format and length of records in the:

The RECORD instruction takes the following form:

RECORD format,rec-len,max-len

where the parameters are:

Parameter
Description
format

The record format, one of:

F - fixed length records of length rec-len

V - variable length records with a minimum length of rec-len and a maximum length of max-len

rec-len

If format is set to F, the record length

If format is set to V, the minimum record length

max-len If format is set to V, the maximum record length

If you do not specify a RECORD instruction for the sort workfile, the format defaults to fixed record format, with the record size equal to the largest record specified in the USE or GIVE instructions.

You do not need to specify a RECORD instruction for input files that are either variable length or indexed files as the file characteristics can be deduced from the file itself.

9.2.1.2.2 ORG Instruction

The ORG instruction specifies the file organization, and can be one of:

ORG Instruction
File Organization
IX indexed
RL relative
SQ sequential (default value)
LS line sequential

You do not need to specify an ORG instruction for input files that are either variable length or indexed files as the file characteristics can be deduced from the file itself.

9.2.1.2.3 KEY Instruction

The KEY instruction specifies the key structure for an indexed file. It is used when an output file is indexed and its key structure is not the same as that of the indexed input file.

The format of the KEY instruction is:

KEY ({start,length,ixkey},...)

where the parameters are:

Parameter
Description
start The starting position of the key in a record, counting in bytes from 1
length The number of bytes in the key
ixkey

One of:

P - Primary key (this must always be defined first)

A - Alternate key

AD - Alternate key with duplicates

C - Component of the last-specified primary or alternate key

You can repeat the KEY instruction as often as required to describe the entire key structure. Use commas to separate the parameters and parameter sets (start, length, ixkey).

You must define the keys in order of importance with the primary key first, followed by all its components if it is split, then the first alternate key and all of its components and so on.

The following example defines three keys:

KEY (4,5,p,10,5,c,20,2,ad,40,2,a,46,10,c)

where:

9.3 Example Commands

This section gives some examples of mfsort commands and jobstreams.

You can find other examples at the IBM document page, Examples of DFSORT Job Streams.

9.3.1 Sorting Using More Than One File

Imagine four indexed files (north.dat, south.dat, east.dat and west.dat) which contain for the north, south, east and west of the country the scores achieved by members of a national organisation in a national competition. The COBOL syntax used to define north.dat is shown below:

file-control.
   select idxfile assign to "north.dat"
      organization is indexed
      record key is member-id.
data division.
file section.
fd idxfile
record contains 39 characters.
01 idxfile-record.
   03 member-id  pic 9(6).
   03 surname    pic x(15).
   03 first-name pic x(15).
   03 score      pic 9(3).

Each of the other files has been created in the same way and the results of the competition have been entered in the files. The following examples use these files.

9.3.1.1 Character Sort in Ascending Order

The following mfsort commands takes all of the records from each of the four files, sorts them on the member's surname in ascending order and outputs the result to the relative file members.dat:

mfsort sort fields(7,15,ch,a)
   use north.dat 
   use south.dat
   use east.dat
   use west.dat
   give members.dat org rl

9.3.1.2 Numeric Sort in Descending Order

The following mfsort command takes each of the four files, sorts them on the member's score (highest score first) and outputs the result to the relative file scores.dat:

mfsort sort fields(37,3,nu,d)
   use north.dat 
   use south.dat
   use east.dat
   use west.dat
   give scores.dat org rl

9.3.1.3 Omitting Records

The following mfsort command takes each of the four files, sorts them on the membership number (which is the primary key) and outputs the result to the indexed file national.dat. All records for which the score field is less than 20 are omitted:

mfsort sort fields(1,6,nu,a)
   use north.dat 
   use south.dat
   use east.dat
   use west.dat
   give national.dat
   omit cond (37,3,nu,lt,20)

9.3.2 Single File Sort Using INCLUDE and a Sub-string Comparison

The following mfsort command takes a line sequential file, sortin.dat and sorts its records on a character field starting at position 11 with a length of 4 bytes. The results are output to the file sortout.dat which will include only records for which the sub-string, starting at position 15 of length 3 bytes, is equal to any three consecutive characters in the string 'J69,L92,J82'.

mfsort sort fields=(11,4,ch,a)
   use sortin.dat org ls record (f 80)
   give sortout.dat
   include cond=(15,3,ss,eq,c'J69,L92,J82')

9.3.3 Transforming Records Using OUTREC

The following mfsort command transforms records containing a field of format cyymmdd to the format yyymmdd.

 Sort C'cyymmdd' 
      SORT FIELDS=(1,7,BI,A)          * sort C'cyymmdd' 
      use mfs110a.in org ls record (f 40)
     * Transform C'cyymmdd' to C'yyyymmdd' 
      OUTFIL OUTREC=(1,1,CHANGE=(2,   * change C'c' as follows: 
                         C'0',C'19',  *   C'0' to C'19' 
                         C'1',C'20',  *   C'1' to C'20' 
                         C'2',C'21'), *   C'2' to C'21' 
                         NOMATCH=(C'99')
                    2,6)              * copy C'yymmdd' 


      give sortout.dat

9.3.4 Sort Using OUTFIL for Complex Reporting

The following is an example of how to use the OUTFIL command to produce a complex report, in this case a Profit and Loss report for one of four divisions. The input file, mfs121a.dat is sorted on the first two fields and only records from the western region are output. The SECTIONS instruction produces a page throw when the field starting in position 3, length 10 bytes changes. The following shows:

9.3.4.1 Input data

  Chips        San Martin     0088902203 West  
  Chips        Oakland        0023412432 West  
  Chips        San Jose       0123213335 West  
  Ice Cream    Marin          0054234123 West  
  Chips        Gilroy         0055484342 West  
  Ice Cream    Napa           0085734283 West  
  Pretzels     San Jose       0123488534 West  
  Ice Cream    San Francisco  0092231245 West  
  Chips        San Francisco  000324343q West  
  Chips        San Jose       0123213335 South 
  Ice Cream    San Martin     0100346730 West  
  Pretzels     Marin          0534332344 West  
  Chips        Gilroy         0055484342 South 
  Chips        Morgan Hill    0098732232 West  
  Pretzels     Morgan Hill    0084384340 West  
  Ice Cream    San Jose       000002345u West  
  Pretzels     Napa           0531234856 West  
  Chips        Oakland        0023412432 South 
  Pretzels     San Martin     000023438r West  
  Chips        Los Angeles    000223401t West  
  Ice Cream    Marin          0054234123 South 
  Pretzels     San Francisco  0541230005 West  
  Ice Cream    Napa           0085734283 South 
  Pretzels     San Jose       0123488534 South 
  Ice Cream    San Francisco  0092231245 South 
  Chips        San Francisco  000324343q South 
  Ice Cream    San Martin     0100346730 South 
  Pretzels     Marin          0534332344 South 
  Chips        Morgan Hill    0098732232 South 
  Pretzels     Morgan Hill    0084384340 South 
  Ice Cream    San Jose       000002345u South 
  Pretzels     Napa           0531234856 South 
  Pretzels     San Martin     000023438r South 
  Chips        Los Angeles    0002234014 South 
  Pretzels     San Francisco  0541230005 South 

9.3.4.2 Mfsort Command

SORT FIELDS=(3,10,A,16,13,A),FORMAT=CH

     use mfs121a.dat  org ls record (f 80)
    OUTFIL
      INCLUDE=(42,6,CH,EQ,C'West'),
      HEADER1=(5/,18:'    Western Region',3/,
                  18:'Profit and Loss Report',3/,
                  18:'     for  ',&DATE,3/,
                  18:'      Page',&PAGE),
      OUTREC=(6:16,13,24:31,10,ZD,M5,LENGTH=20,75:X),
      SECTIONS=(3,10,SKIP=P,
        HEADER3=(2:'Division:  ',3,10,5X,'Page:',&PAGE,2/,
                 6:'Branch Office',24:'       Profit/(Loss)',/,
                 6:'-------------',24:'--------------------'),
       TRAILER3=(6:'=============',24:'====================',/,
                 6:'Total',24:TOTAL=(31,10,ZD,M5,LENGTH=20),/,
                 6:'Lowest',24:MIN=(31,10,ZD,M5,LENGTH=20),/,
                 6:'Highest',24:MAX=(31,10,ZD,M5,LENGTH=20),/,
                 6:'Average',24:AVG=(31,10,ZD,M5,LENGTH=20),/,
                 3/,2:'Average for all Branch Offices so far:',
                    X,SUBAVG=(31,10,ZD,M5))),
        TRAILER1=(8:'Page ',&PAGE,5X,'Date:  ',&DATE,5/,
                  8:'Total Number of Branch Offices Reporting:  ',
                    COUNT,2/,
                  8:'Summary of Profit/(Loss) for all',
                    ' Western Division Branch Offices',2/,
                  12:'Total:',
                      22:TOTAL=(31,10,ZD,M5,LENGTH=20),/,
                  12:'Lowest:',
                      22:MIN=(31,10,ZD,M5,LENGTH=20),/,
                  12:'Highest:',
                      22:MAX=(31,10,ZD,M5,LENGTH=20),/,
                  12:'Average:',
                      22:AVG=(31,10,ZD,M5,LENGTH=20))
        give outfil1.dat

9.3.4.3 Output





                     Western Region


                 Profit and Loss Report


                      for  11/05/95


                       Page     1
**************************************************************************
  Division:  Chips          Page:     2

     Branch Office            Profit/(Loss)
     -------------     --------------------
     Gilroy                     554,843.42
     Los Angeles                (22,340.14)
     Morgan Hill                987,322.32
     Oakland                    234,124.32
     San Francisco              (32,434.31)
     San Jose                 1,232,133.35
     San Martin                 889,022.03
     =============     ====================
     Total                    3,842,670.99
     Lowest                     (32,434.31)
     Highest                  1,232,133.35
     Average                    548,952.99



 Average for all Branch Offices so far:     548,952.99
**************************************************************************
  Division:  Ice Cream      Page:     3

     Branch Office            Profit/(Loss)
     -------------     --------------------
     Marin                      542,341.23
     Napa                       857,342.83
     San Francisco              922,312.45
     San Jose                      (234.55)
     San Martin               1,003,467.30
     =============     ====================
     Total                    3,325,229.26
     Lowest                        (234.55)
     Highest                  1,003,467.30
     Average                    665,045.85



 Average for all Branch Offices so far:     597,325.02
**************************************************************************
  Division:  Pretzels       Page:     4

     Branch Office            Profit/(Loss)
     -------------     --------------------
     Marin                    5,343,323.44
     Morgan Hill                843,843.40
     Napa                     5,312,348.56
     San Francisco            5,412,300.05
     San Jose                 1,234,885.34
     San Martin                  (2,343.82)
     =============     ====================
     Total                   18,144,356.97
     Lowest                      (2,343.82)
     Highest                  5,412,300.05
     Average                  3,024,059.49



 Average for all Branch Offices so far:   1,406,236.51
**************************************************************************
        Page      5     Date:  11/05/95




       Total Number of Branch Offices Reporting:        18

       Summary of Profit/(Loss) for all Western Division Branch Offices

           Total:          25,312,257.22
           Lowest:            (32,434.31)
           Highest:         5,412,300.05
           Average:         1,406,236.51

9.4 Workfile

During a sort or merge operation, mfsort uses a temporary workfile. This workfile is paged to disk in the current directory or, if it is set, in the directory specified by the TMP environment variable.

Mfsort copies all the records from each of the input files to the temporary workfile, truncated or padded as appropriate. The workfile is then sorted or merged according to its key description. After being sorted or merged in the workfile, the records are copied to each of the output files and truncated or padded as appropriate.

During this operation:

9.5 Error Messages

A full list of mfsort error messages is given in the Net Express online help. (Click Help Topics on the Help menu. Then, on the Index tab, double-click Mfsort, error messages.)


Copyright © 2000 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names used herein are protected by international law.

PreviousBtrieve Overview of Working with Data FilesNext