PreviousDevice Handling Library RoutinesNext

Chapter 16: Integrated Preprocessor Interface

This chapter describes the Integrated Preprocessor Interface, which is an extension to the COBOL Compiler. You should read it if you are creating a preprocessor for the first time or if you are migrating a preprocessor from another environment.

16.1 Overview

Language preprocessors (also known as precompilers) exist in order to convert non-standard COBOL, or non-COBOL code embedded in COBOL, into a form that the Compiler will process.

Non-integrated preprocessors will take as input a source-file, read and parse it, and produce a modified source file which is then passed as input to the COBOL compiler. This has the following disadvantages:

These factors can significantly lengthen the development cycle.

The Integrated Preprocessor Interface overcomes these problems by enabling the preprocessor to mark the relationship between the original source code and the modified form. Although the Compiler will actually process the modified COBOL, it will only ever show the original code. The Integrated Preprocessor Interface enables a preprocessor to tightly integrate with the Compiler so that you are virtually unaware it is present.

16.1.1 Considerations

The preprocessor model remains unchanged when using the Integrated Preprocessor Interface - the preprocessor reads the source file(s) and passes modified source lines to the Compiler. This makes the interface completely general purpose. The disadvantage of this approach is that the preprocessor has to handle COBOL constructs such as continuation, copyfile expansion and the effects of REPLACE and REPLACING itself. These latter two issues are resolved by using the CP Preprocessor (see later in the chapter) although this does require that COPY statements conform to normal COBOL syntax rules. If, by comparison, the Compiler read source files and passed single tokens to the preprocessor, restrictions such as the format of a token and the syntax of a COPY (or equivalent) statement would be imposed.

The underlying code that is actually compiled may be significantly different from the original. The following effects may therefore be noticed:

16.2 Invoking a Preprocessor

The preprocessor is invoked by the Compiler, which is directed to do so by using the PREPROCESS directive.

The command for invoking a preprocessor is:

cob filename -C" [directives] 
    preprocess(preproc) [preproc-params]"

Where the parameters are:

filename The name of the source file
directives Any additional Compiler directives you want to use
preproc The name of the preprocessor
preproc-params One or more of the optional preprocessor parameters described in the section Preprocessor Parameters

All directives on the command line following the PREPROCESS directive and up to the end of the line, or the ENDP directive, are passed to the preprocessor without examination.

In a similar manner to other Compiler directives, the PREPROCESS directive can be either placed in a directives file or included in a $SET statement within the source code. It should not be specified in more than one place. If it is in a $SET statement, it must be the first line in the source file.

16.2.1 Invoking Multiple Preprocessors

Multiple preprocessors are invoked by passing a preprocess directive to the preprocessor so that it calls the next preprocessor. It is the responsibility of the first preprocessor to call the second; this might then call a third, and so on. Preprocessors that are not written to allow this stack method can only be used as the last preprocessor in the stack. See the section Multiple Preprocessors later in this chapter for details.

The command line for invoking several preprocessors is:

cob filename -C "[directives] 
    preprocess(preproc1) [preproc1-params]
    [preprocess(preproc2) [preproc2-params]] ..."

where the parameters are:

filename The name of the source file
directives Any additional Compiler directives you want to use
preproc1 The name of the preprocessor invoked by the Compiler
preproc1-params One or more of the optional preprocessor parameters for preproc1 described in the section Preprocessor Parameters
preproc2 The name of a preprocessor invoked by preproc1
preproc2-params One or more of the optional preprocessor parameters for preproc2 described in the section Preprocessor Parameters

16.2.2 Testing a Preprocessor

You can only debug a processor using the CBL_DEBUGBREAK routine. You should insert this routine in your preprocessor source code and then compile it.

cob preproc 

When your program calls CBL_DEBUGBREAK, Animator starts debugging your preprocessor from that point in your code.

To animate your preprocessor, enter on the command line:

cob filename.cbl -C "[directives] preprocess(preproc) [preproc-params]"

and then enter:

anim filename

where the parameters are:

filename The name of the source code
directives One or more Compiler directives
preproc The name of the preprocessor
preproc-params One or more of the optional preprocessor parameters described in the section Preprocessor Parameters

For more information on the CBL_DEBUGBREAK routine, see the chapter Starting Animator in your Debugging Handbook.

Note: Invoking the compiler in this way does not allow the use of Compiler directives for the generate phase.

16.3 Writing a Preprocessor

This section explains how to write an integrated preprocessor and describes the interface used to pass information between a preprocessor and the Compiler.

Although a preprocessor could be written in a language other than COBOL, the following description assumes that it is written in COBOL.

The Integrated Preprocessor Interface works on the simple concept that preprocessing is a form of editing. The preprocessor marks each line of the source code as unchanged, inserted (that is, new lines) or modified (that is, old lines that are not to be compiled). When compiling a COBOL program, the Compiler calls the preprocessor instead of directly reading the source file and receives the code line by line from the preprocessor.

The operation of Animator depends upon a mapping of each line of object code on to each line of source code. The marking of source lines described above allows this mapping to be valid even though the object code does not match the source code.

16.3.1 Definition of the Interface

This section describes the interface between the Compiler and a preprocessor. Preprocessor Parameters

Three parameters are passed across the interface:


mode-flag is used to pass control information, buffer is used for text information (source lines and filenames) and response is used to indicate the type of source line in the buffer.

Use the following data structure to pass these parameters:

 01  mode-flag                      pic 9(2) comp-x.
 01  buffer                         pic x(n).
 01  response.
     03  response-status            pic 9(2) comp-x.
     03  response-code-1            pic 9(4) comp-x.
     03  filler redefines response-code-1.
         05  filler                 pic x.
         05  resp-main              pic 9(2) comp-x.
     03 response-code-2             pic 9(4) comp-x.
     03 filler redefines response-code-2.
         05 filler                  pic x.
         05 resp-more               pic 9(2) comp-x.

Note: See your Language Reference for details on the data type COMP-X. The Initial Call

The initial call is made to the preprocessor at the point where the Compiler would normally open the source file. The mode-flag parameter is set to 0 and the name of the source file is placed in buffer. If the Compiler was able to locate the source file, the name contains the full path and file extension. If it was not able to do so, the name is as specified on the command line.

In addition, this call initiates a handshaking process so that the Compiler and preprocessor can determine their respective levels of support. This enables new functionality to be built into the preprocessor interface while at the same time ensuring that it is not made use of unless both the Compiler and preprocessor both support it - the Compiler will not pass any information to the preprocessor unless it has been told that the preprocessor can process it; similarly the preprocessor should not make any requests of the Compiler unless it has been told that it can do so. The process has been designed so that older Compilers and preprocessors continue to work even though they do not take account of the handshaking process.

When the Compiler calls the preprocessor, response-code-2 is used to inform the preprocessor of its support level. When the preprocessor returns to the Compiler, it uses the same parameter for its support level. As features are added the support levels are incremented; each level includes the support in the levels before it. To ensure that old Compilers are supported, the value 8224 is a special case - it indicates that the parameter was not initialized and represents the "base level". Similarly, an old preprocessor that does not set a support level will not change the parameter value, so all values of 32767 or lower are treated as the "base level".

Features not in the base levels, which are explained in the following sections, are headed "Not base level". In summary. the Compiler support levels are:

response-code-2 Feature
8224 base level
0 response-code-1 contains length of buffer
1 resp-main of 14 supported
2 Compiler understands preprocessor support level; will tell preprocessor to abort

while the preprocessor support levels are:

response-code-2 Feature
32767 or less base level
32768 Compiler may tell the preprocessor to abort

Not base level:
The Compiler puts the length of buffer in response-code-1. The preprocessor may return source lines up to this length. If response-code-2 is 8224 then the preprocessor should assume the length is 80 bytes.

The preprocessor should open the file and return zero in response-status to indicate success, or 255 to indicate failure.

The operating system command line contains any directives to the preprocessor, terminated by spaces. These directives are specified in the Compiler command line, directives files or $SET statements and follow the PREPROCESS directive itself. See the section Invoking a Preprocessor for further information. For details on how to read the operating system command line, see COMMAND-LINE in your Language Reference. The directives are used to pass information from the command line to the integrated preprocessor and are defined by the designer of the preprocessor. The preprocessor should not expect the directives it receives to be separated by only one space character. The command line format matches that of the PREPROCESS directive and each preprocessor directive (including the first) can be preceded by one or more spaces. Subsequent Calls

Subsequent calls request a line of source code until the preprocessor indicates that the last line has been reached.

In these calls, the Compiler sets mode-flag to 1 and response-status to 0, except as documented in the section Handling COPY Statements later in this chapter. The preprocessor returns information in buffer, resp-main and resp-more as defined in the next section. If there is an error, response-status should be set to a nonzero value (the remaining fields can be left undefined).

The first byte of response-code-1 and response-code-2 are reserved for future use and must always be set to zero on return. The simplest way to achieve this is to set response-code-1 and response-code-2 to zero before setting resp-main and resp-more.

If you wish to modify the source code, you should note that the original source code lines should always be passed back before their replacement line(s). In addition, lines that are continued over several lines of source code must be treated as one block; it is not possible to modify part of the logical line.

Not base level:
If the Compiler sets mode-flag to 2, it is about to terminate. On this call, the preprocessor should perform any clean-up operations, such as delete temporary work files. Also, if the preprocessor is stacked, it should invoke the next preprocesor in the same way before cancelling this preprocessor. See the section Multiple Preprocessors.

16.3.2 Preprocessor Response Codes

After the initial call to the preprocessor, subsequent calls return with either source lines, which can be marked as unchanged, old (to be treated as commented out), new or COPY statements; or with other requests such as increment an error count or return a directive setting.

The value in resp-main is used to signify what is being returned; additional information may be placed in resp-more and/or buffer.

The values of resp-main are:

0 The source file has been completely processed and there is no further input. buffer and resp-more are ignored.
1 buffer contains a new line added by the preprocessor which was not in the original source code. resp-more optionally contains the position of the verb in the original source line that is being replaced. The line added by the preprocessor must not be a comment.
2 buffer contains a line in the original source code which is to be ignored by the Compiler. resp-more is ignored.
3 buffer contains a line in the original source code which contains the start of a COPY statement that is about to be expanded by the preprocessor. resp-more contains the position of the statement on the line.
4 buffer contains a line in the original source code which contains the continuation of a COPY statement. resp-more is ignored.
5 buffer contains a warning message inserted by the preprocessor. This must have the format of a comment line (that is, the value "*" in the indicator area of the source line). resp-more is ignored.
6 An unrecoverable error has occurred; this forces the Compiler to abort and enter the COBOL Editor. In such a case, a message of up to 70 characters might be written to buffer and this is displayed on the bottom line of the Editor. resp-more is ignored.
7 An error has occurred; this forces the Compiler to increment its internal error count. All error classes can be specified by using resp-more (see the section Generating Error Messages below). The contents of buffer are ignored.
8 This value is generated by the CP preprocessor. See the section The CP Preprocessor for details.
9 This value is used when the CP preprocessor is in use. See the section The CP Preprocessor for details.
10 Identical to 11, below.
11 buffer contains a new line added by the preprocessor which contains the start of a COPY statement that is about to be expanded by the preprocessor. It is used when the COPY statement is not unique on a line or the original text was not a COBOL COPY statement. resp-more contains the position of the COPY statement on the line.
12 buffer contains a new line added by the preprocessor which contains the continuation of a COPY statement. resp-more is ignored.
13 Causes the Compiler to return information about its directive settings. The required directive might optionally be placed in buffer; resp-more is ignored.
14 Not base level:This value is similar to 10 and 11 above, except that it indicates that the original source contained -INC or ++INCLUDE.
32 buffer contains a line from the original source code which has not been modified by the preprocessor.
33 - 64 These values are generated by the CP preprocessor. See the section The CP Preprocessor for details.
128 The end of a copyfile has been reached. buffer must be empty. Inserting Source Lines

When resp-main contains the value 1, resp-more is used to indicate the position in the original source of the replaced non-COBOL verb as follows:

0 No verb replacement is taking place.
nn The number of the column containing the first character of the non-COBOL verb being replaced by the current line. The line(s) containing the non-COBOL verb would have previously been marked by returning the value of 2 inresp-main; if there were more than one line, the verb is assumed to be on the first of them.

For example, if the original source contains:

exec abc
do something useful

and these three lines are replaced by:

call abc_something_useful

then the value of nn gives the position of the EXEC statement. Generating Error Messages

If the preprocessor encounters an error when processing the source code, it can communicate this to the Compiler so that the error is treated as a syntax error. There are two ways to do this:

The value in resp-more specifies the column number in which the error was found. It is used when positioning the cursor on return to the Editor.

It is also possible to force the Compiler to increment its internal error counts in conjunction with one of the two operations described above. This is done by setting resp-main to the value 7 and specifying which error count is to be increased in resp-more. Possible values for resp-more are:

1 Unrecoverable error
2 Severe error
3 Error
4 Warning
5 Informational
6 Flag count

Increasing the unrecoverable error count causes the Compiler to abort immediately. The contents of buffer are ignored.

It is the responsibility of the preprocessor to output error messages to the user before forcing the Compiler to either abort or increment the error counts. This should not be confused with the message that can be inserted in the list file which is for informational purposes only. The name specified in this COPY statement is read and stored for use by Animator. It is strongly recommended that the name is enclosed in quotes so that no unwanted case folding occurs when reading the name as, on platforms where case is significant, Animator might not be able to locate the file. Querying Directive Settings

When resp-main contains the value 13, buffer must contain either spaces or the name of a specific directive setting that is required. When the processor is next called, buffer contains the value of a directive setting. When the value 13 is first returned, the Compiler builds up a list of all of its directive settings and returns one value from it. Subsequent returns of this value return further values from this list. If a value other than 13 is returned at any point, further returns of value 13 cause the list to be generated again. The list of directives is generated in alphabetical order; once all directive settings have been returned, further calls yield spaces. If buffer contains spaces the first, or next, directive setting is returned; if buffer contains the name of a directive, the Compiler searches through its list to the specified directive. Only Checker directives are returned in this way. Handling COPY Statements

If the preprocessor is not interested in the contents of a copyfile, it can pass the COPY statement unmodified through to the Compiler, where it is expanded.

Only valid COBOL constructs can be passed in this way. Any non-COBOL constructs can be commented out and replaced by a valid COPY statement which the Compiler then expands.

In both of these cases, the preprocessor has no opportunity to read and process the copyfile itself.

The Compiler expands all forms of COPY statements that it would expand with no preprocessor present, whether passed back as unchanged or modified lines. The following are supported:

If the preprocessor does want to examine the copyfile contents, it must either expand the copyfile itself, or use the CP preprocessor documented later in this chapter. The contents of the copyfile itself are returned to the Compiler in the same manner as the lines in the main source file, however the COPY statement itself receives special handling.

In the simplest case, when the COPY statement is the only statement on one or more lines, and specifies the full filename (including the extension and path if necessary), the first of these lines is passed to the Compiler marked with resp-main set to 3 and resp-more set to the column number of the start of the COPY, and all subsequent lines are passed with resp-main set to 4.

If the location of the file specified in the COPY statement is resolved by adding a filename extension or path, or if the COPY statement is not unique on the line(s), or is not a regular COBOL COPY statement, it is necessary to comment out (mark with resp-main set to 2) all lines containing the COPY statement and then pass through all other lines as inserted. Inserted COPY statements that the preprocessor is expanding must conform to the normal syntax rules for COBOL, and be marked with resp-main set to 11 or 14, resp-more set to nn (where nn is the position of the start of the COPY statement) for the first line and resp-main set to 12 for all subsequent lines.

For example, if the source contains:


This is first returned with resp-main set to 2 to indicate that this is a line that is about to be replaced. On the next call, the preprocessor returns:

01 ITEM-A.

with resp-main set to 1 to indicate that this is a replacement line. On the next call, the preprocessor returns:


This time, resp-main is set to 11 to indicate this is a replacement line containing the COPY statement alone. resp-more is set to 20, the position of the word COPY on the original source line.

Note that when a COPY statement is returned to the Compiler, whether to be expanded by the preprocessor or Compiler, the Compiler parses it and checks for REPLACING. This REPLACING affects all lines in this and nested copyfiles. The name specified in this COPY statement is read and stored for use by Animator. We recommend that the name be enclosed in quotes so that no unwanted case folding occurs when reading the name as, on platforms where case is significant, Animator might not be able to locate the file.

Not base level:
resp-main=14 is used when the original word was not a COBOL COPY statements, for example ++INCLUDE or -INC. Some Compiler directives may be specified in $SET statements before any COBOL source but not after. Such statements may follow ++INCLUDE but not COPY. Use of resp-main=14 allows a ++INCLUDE to be expanded while at the same time allowing such directives to be set subsequently.

Value 128 should only be used for resp-main when the preprocessor has finished expanding a copyfile. If it is not, the Compiler aborts the compilation. The value 0 should be used at the end of the main source file.

COPY statements in Identification Division comment entries are not always expanded. If a preprocessor signals that it is about to expand a copyfile (resp-main set to 3 or 11) at a point where this is not valid, the Compiler sets response-status to non-zero on the next call to the preprocessor. The preprocessor should abandon the copyfile immediately and send the end-of-copyfile marker (resp-main set to 128) as though the copyfile were empty.

The CP preprocessor (documented later in this chapter) expands copyfiles so that other preprocessors do not have to do so. It generates lines as documented above.

16.3.3 Source Modification

A number of COBOL commands exist to modify a source file. The following statements are not supported by the Compiler when a preprocessor is active and must be handled by the preprocessor:

The CP preprocessor returns information to other preprocessors about the effect of REPLACE.

16.3.4 Multiple Preprocessors

Several preprocessors can be active simultaneously on the same source program. They are arranged in a stack so that the Compiler calls the top preprocessor in the stack, this preprocessor calls the next preprocessor and so on to the preprocessor at the bottom of the stack which actually reads the source code. Each line of source is then passed through every preprocessor in turn until it reaches the top of the stack and is passed to the Compiler. In order for this to work, the preprocessor must obey some additional rules.

As described in the section Invoking a Preprocessor, the Compiler writes directives to the command line when it makes the initial call to a preprocessor. The preprocessor reads this command line and, if it finds a PREPROCESS directive, invokes the preprocessor named in it. It also passes any parameters following the directive to the invoked preprocessor by writing these in turn to the command line and continues the hand shaking process described in this section.

As preprocessor directives to the Compiler are terminated by the ENDP directive or the end of the line, multiple PREPROCESS directives must all appear on the same line in a directives file.

The interface between two preprocessors is identical to that between the Compiler and preprocessor specified above, thus it is possible to use a stackable preprocessor in both stacked and unstacked situations. A preprocessor which has not been designed to be stackable must be stacked at the end of the stack where it is directly reading the source code.

In most cases, preprocessors act on discrete sets of syntax and all produce valid COBOL syntax, so it is unlikely that more than one preprocessor will want to process a particular source line. However, it is possible that preprocessors in a stack represent language levels so that a source line is edited several times by the different preprocessors on its route through the stack to the Compiler. In this case, care must be taken to read the information in the response field so that the relationship between the source code and the code that finally reaches the Compiler is maintained. The order in which the stacking takes place should also be chosen carefully if code altered by one preprocessor is also to be successfully modified by a second preprocessor.

16.3.5 Considerations When Writing a Preprocessor

Any change made to source code might conflict with either the user selected COBOL dialect or data names. With care, any potential problems can easily be avoided. Conflicts with User Selected COBOL Dialect

If the user code conforms to a dialect that does not support features required by the preprocessor there are two issues to be overcome. Firstly, non-conforming code would be flagged; secondly the required reserved words might not be in the dictionary.

To avoid the problem of flagging, no lines that are inserted by a preprocessor are ever flagged by the Compiler.

To make all required reserved words available, a preprocessor could generate a line that sets, for example, the ANS85 directive. However, this has some unwelcome side-effects: the behavior of some statements is changed, and valid data-names are rejected as they are now reserved words instead.

A better technique is to determine which words are actually needed and selectively add only these using the ADDRSV directive. For example, if you only need the reserved word FUNCTION, you can add it using the ADDRSV(FUNCTION) directive. This, however, can still causes problems if the user code declares FUNCTION as a data name.

The best solution of all is to use the ADDSYN directive, and choose an alternative word to FUNCTION that is unlikely to be used by the user. For example, use thedirective ADDSYN "FUNCTION" = "PREPGEN--FUNCTION" and thereafter use PREPGEN--FUNCTION instead of FUNCTION when generating code.

Note: Using the ADDSYN technique might confuse any preprocessors that subsequently process the source. Conflicts with Data Names

If a preprocessor needs to create data for its own use, it should choose names that are unlikely to conflict with user-chosen names. As in the section above, selecting a name like PREPGEN--USERID is unlikely to conflict with a user-selected name, but USERID alone might well cause problems.

16.4 The CP Preprocessor

The CP preprocessor is designed to be used as a stacked preprocessor. It reads and expands source files (including copyfiles), plus it returns additional information to the other preprocessors in the stack about the effect of REPLACE and REPLACING. It thus removes these quite complex functions from other preprocessors.

16.4.1 Limitations

The CP preprocessor has the following restrictions:

16.4.2 Copyfile Expansion

When the CP preprocessor encounters a COPY statement it marks the source lines as documented earlier in this chapter, and reads through the copyfile. It locates the copyfiles using the same pathnames and file extensions as the Compiler would.

If a copyfile cannot be located, the CP preprocessor comments out the COPY statement and generates appropriate error messages. If there is an error in parsing the statement, the preprocessor ignores it and carries on parsing tokens from the point of the error. If other preprocessors receive an unmodified COPY statement they can assume that the CP preprocessor was unable to handle it; they should leave it and let the Compiler generate an error message.

16.4.3 REPLACE Notification

To enable other preprocessors to parse the source files as the Compiler would after the effect of REPLACE and REPLACING, the preprocessor returns additional lines as documented here. However, the Compiler still performs this REPLACING itself except as noted below. The additional lines are passed for information only.

If the preprocessor ever detects that a line is to be modified, it adds 32 to the value of resp-main that it returns. The line or lines thus marked are followed by one or more lines marked with the value of 8 in resp-main, containing what would be in the line after it was modified. As these changes are for information only, other preprocessors receiving the lines should pass these values back out again for any other preprocessors in the stack; when the Compiler receives lines marked with values in the range 33 to 64 it subtracts 32, and it ignores lines marked with an 8.

If a preprocessor needs to modify a line affected in this way, it should do so as normal; however, it should mark the new line with a 9 rather than a 1 to inform the Compiler that this new, inserted, line does not need to be tested for the effect of REPLACE or REPLACING.

16.4.4 Directives

The preprocessor accepts a number of directives. In order to keep the command line short, these exist in an abbreviated form in addition to the full name. After looking at the command line in the usual way, it also looks at the environment variable CPDIR for directives.


Specifies whether to create a trace file, and optionally specifies the name of the file to be used. If the filename is omitted, it is created as progname.cpt, where progname is the basename of the main source file.

     |            +-------------+  |



Specifies whether directives are to be shown on the screen if accepted.




Stacks another preprocessor (that is, specifies that the source input to this preprocessor is to come from another preprocessor rather than the source file itself).

     |                      +-DIRECTIVE-NAME-+  |



Specifies whether warnings about directive settings are to be shown on screen




Specifies whether EXEC SQL INCLUDE is to be treated as a COPY statement.



16.5 CP Errors

All messages have the format:

*CP   nnn-X 
**          description

where the variables are:

nnn The error number
X The severity level. This can be:
W Warning - processing continues
S Severe - during initialization causes processing to
stop, otherwise non-fatal.
U Unrecoverable - causes processing to stop.

nnn values and the associated descriptions are shown below, along with causes of the error and action you can take.

16.5.1 Initialization Errors

The following initialization error messages might occur.

Illegal command line

Compiler level set to 1; some features disabled

Open fail: filename

Open fail: filename

Call to stacked preprocessor name failed

Stacked preprocessor returned an error

Unable to open a heap

16.5.2 Main processing errors

These error messages might be produced while processing source.

Internal stack full - contact technical support

File error - contact technical support

Copybook filename not found

Nested REPLACING is not allowed.

Undefined internal error - contact technical support

Copyright © 2000 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names used herein are protected by international law.

PreviousDevice Handling Library RoutinesNext