Device Handling | Library Routines |
This chapter describes the Integrated Preprocessor Interface, which is an extension to the COBOL Compiler. You should read it if you are creating a preprocessor for the first time or if you are migrating a preprocessor from another environment.
Language preprocessors (also known as precompilers) exist in order to convert non-standard COBOL, or non-COBOL code embedded in COBOL, into a form that the Compiler will process.
Non-integrated preprocessors will take as input a source-file, read and parse it, and produce a modified source file which is then passed as input to the COBOL compiler. This has the following disadvantages:
These factors can significantly lengthen the development cycle.
The Integrated Preprocessor Interface overcomes these problems by enabling the preprocessor to mark the relationship between the original source code and the modified form. Although the Compiler will actually process the modified COBOL, it will only ever show the original code. The Integrated Preprocessor Interface enables a preprocessor to tightly integrate with the Compiler so that you are virtually unaware it is present.
The preprocessor model remains unchanged when using the Integrated Preprocessor Interface - the preprocessor reads the source file(s) and passes modified source lines to the Compiler. This makes the interface completely general purpose. The disadvantage of this approach is that the preprocessor has to handle COBOL constructs such as continuation, copyfile expansion and the effects of REPLACE and REPLACING itself. These latter two issues are resolved by using the CP Preprocessor (see later in the chapter) although this does require that COPY statements conform to normal COBOL syntax rules. If, by comparison, the Compiler read source files and passed single tokens to the preprocessor, restrictions such as the format of a token and the syntax of a COPY (or equivalent) statement would be imposed.
The underlying code that is actually compiled may be significantly different from the original. The following effects may therefore be noticed:
A preprocessor can, of course, generate lines to switch directive settings, but this might cause problems, especially if your code uses names for data items that conflict with reserved words in the language selected by the preprocessor.
The preprocessor is invoked by the Compiler, which is directed to do so by using the PREPROCESS directive.
The command for invoking a preprocessor is:
cob filename -C" [directives] preprocess(preproc) [preproc-params]"
Where the parameters are:
filename |
The name of the source file |
directives |
Any additional Compiler directives you want to use |
preproc |
The name of the preprocessor |
preproc-params
|
One or more of the optional preprocessor parameters described in the section Preprocessor Parameters |
All directives on the command line following the PREPROCESS directive and up to the end of the line, or the ENDP directive, are passed to the preprocessor without examination.
In a similar manner to other Compiler directives, the PREPROCESS directive can be either placed in a directives file or included in a $SET statement within the source code. It should not be specified in more than one place. If it is in a $SET statement, it must be the first line in the source file.
Multiple preprocessors are invoked by passing a preprocess directive to the preprocessor so that it calls the next preprocessor. It is the responsibility of the first preprocessor to call the second; this might then call a third, and so on. Preprocessors that are not written to allow this stack method can only be used as the last preprocessor in the stack. See the section Multiple Preprocessors later in this chapter for details.
The command line for invoking several preprocessors is:
cob filename -C "[directives] preprocess(preproc1) [preproc1-params] [preprocess(preproc2) [preproc2-params]] ..."
where the parameters are:
filename |
The name of the source file |
directives |
Any additional Compiler directives you want to use |
preproc1 |
The name of the preprocessor invoked by the Compiler |
preproc1-params
|
One or more of the optional preprocessor
parameters for preproc1 described in the section
Preprocessor Parameters |
preproc2 |
The name of a preprocessor invoked by
preproc1 |
preproc2-params
|
One or more of the optional preprocessor
parameters for preproc2 described in the section
Preprocessor Parameters |
You can only debug a processor using the CBL_DEBUGBREAK routine. You should insert this routine in your preprocessor source code and then compile it.
cob preproc
When your program calls CBL_DEBUGBREAK, Animator starts debugging your preprocessor from that point in your code.
To animate your preprocessor, enter on the command line:
cob filename.cbl -C "[directives] preprocess(preproc) [preproc-params]"
and then enter:
anim filename
where the parameters are:
filename |
The name of the source code |
directives |
One or more Compiler directives |
preproc |
The name of the preprocessor |
preproc-params
|
One or more of the optional preprocessor parameters described in the section Preprocessor Parameters |
For more information on the CBL_DEBUGBREAK routine, see the chapter Starting Animator in your Debugging Handbook.
Note: Invoking the compiler in this way does not allow the use of Compiler directives for the generate phase.
This section explains how to write an integrated preprocessor and describes the interface used to pass information between a preprocessor and the Compiler.
Although a preprocessor could be written in a language other than COBOL, the following description assumes that it is written in COBOL.
The Integrated Preprocessor Interface works on the simple concept that preprocessing is a form of editing. The preprocessor marks each line of the source code as unchanged, inserted (that is, new lines) or modified (that is, old lines that are not to be compiled). When compiling a COBOL program, the Compiler calls the preprocessor instead of directly reading the source file and receives the code line by line from the preprocessor.
The operation of Animator depends upon a mapping of each line of object code on to each line of source code. The marking of source lines described above allows this mapping to be valid even though the object code does not match the source code.
This section describes the interface between the Compiler and a preprocessor.
Three parameters are passed across the interface:
mode-flag
buffer
response
mode-flag
is used to pass control information,
buffer
is used for text information (source lines
and filenames) and response
is used to indicate the
type of source line in the buffer.
Use the following data structure to pass these parameters:
01 mode-flag pic 9(2) comp-x. 01 buffer pic x(n). 01 response. 03 response-status pic 9(2) comp-x. 03 response-code-1 pic 9(4) comp-x. 03 filler redefines response-code-1. 05 filler pic x. 05 resp-main pic 9(2) comp-x. 03 response-code-2 pic 9(4) comp-x. 03 filler redefines response-code-2. 05 filler pic x. 05 resp-more pic 9(2) comp-x.
Note: See your Language Reference for details on the data type COMP-X.
The initial call is made to the preprocessor at the point where the
Compiler would normally open the source file. The mode-flag
parameter is set to 0 and the name of the source file is placed in buffer
.
If the Compiler was able to locate the source file, the name contains the
full path and file extension. If it was not able to do so, the name is as
specified on the command line.
In addition, this call initiates a handshaking process so that the Compiler and preprocessor can determine their respective levels of support. This enables new functionality to be built into the preprocessor interface while at the same time ensuring that it is not made use of unless both the Compiler and preprocessor both support it - the Compiler will not pass any information to the preprocessor unless it has been told that the preprocessor can process it; similarly the preprocessor should not make any requests of the Compiler unless it has been told that it can do so. The process has been designed so that older Compilers and preprocessors continue to work even though they do not take account of the handshaking process.
When the Compiler calls the preprocessor, response-code-2
is used to inform the preprocessor of its support level. When the
preprocessor returns to the Compiler, it uses the same parameter for its
support level. As features are added the support levels are incremented;
each level includes the support in the levels before it. To ensure that
old Compilers are supported, the value 8224 is a special case - it
indicates that the parameter was not initialized and represents the "base
level". Similarly, an old preprocessor that does not set a support
level will not change the parameter value, so all values of 32767 or lower
are treated as the "base level".
Features not in the base levels, which are explained in the following sections, are headed "Not base level". In summary. the Compiler support levels are:
response-code-2
|
Feature |
---|---|
8224 | base level |
0 | response-code-1 contains length
of buffer |
1 | resp-main of 14 supported |
2 | Compiler understands preprocessor support level; will tell preprocessor to abort |
while the preprocessor support levels are:
response-code-2
|
Feature |
---|---|
32767 or less | base level |
32768 | Compiler may tell the preprocessor to abort |
Not base level:
The Compiler puts the length of buffer
in response-code-1
.
The preprocessor may return source lines up to this length. If response-code-2
is 8224 then the preprocessor should assume the length is 80 bytes.
The preprocessor should open the file and return zero in response-status
to indicate success, or 255 to indicate failure.
The operating system command line contains any directives to the preprocessor, terminated by spaces. These directives are specified in the Compiler command line, directives files or $SET statements and follow the PREPROCESS directive itself. See the section Invoking a Preprocessor for further information. For details on how to read the operating system command line, see COMMAND-LINE in your Language Reference. The directives are used to pass information from the command line to the integrated preprocessor and are defined by the designer of the preprocessor. The preprocessor should not expect the directives it receives to be separated by only one space character. The command line format matches that of the PREPROCESS directive and each preprocessor directive (including the first) can be preceded by one or more spaces.
Subsequent calls request a line of source code until the preprocessor indicates that the last line has been reached.
In these calls, the Compiler sets mode-flag
to 1
and response-status
to 0, except as documented in
the section Handling COPY Statements later in
this chapter. The preprocessor returns information in buffer,
resp-main
and resp-more
as
defined in the next section. If there is an error, response-status
should be set to a nonzero value (the remaining fields can be left
undefined).
The first byte of response-code-1
and response-code-2
are reserved for future use and must always be set to zero on return. The
simplest way to achieve this is to set response-code-1
and response-code-2
to zero before setting resp-main
and resp-more
.
If you wish to modify the source code, you should note that the original source code lines should always be passed back before their replacement line(s). In addition, lines that are continued over several lines of source code must be treated as one block; it is not possible to modify part of the logical line.
Not base level:
If the Compiler sets mode-flag
to 2, it is
about to terminate. On this call, the preprocessor should perform any
clean-up operations, such as delete temporary work files. Also, if the
preprocessor is stacked, it should invoke the next preprocesor in the same
way before cancelling this preprocessor. See the section Multiple
Preprocessors.
After the initial call to the preprocessor, subsequent calls return with either source lines, which can be marked as unchanged, old (to be treated as commented out), new or COPY statements; or with other requests such as increment an error count or return a directive setting.
The value in resp-main
is used to signify what is
being returned; additional information may be placed in resp-more
and/or buffer
.
The values of resp-main
are:
Value
|
Description
|
---|---|
0 | The source file has been completely
processed and there is no further input. buffer
and resp-more are ignored. |
1 | buffer contains a
new line added by the preprocessor which was not in the original source
code. resp-more optionally contains the position
of the verb in the original source line that is being replaced. The line
added by the preprocessor must not be a comment. |
2 | buffer contains a
line in the original source code which is to be ignored by the Compiler.
resp-more is ignored. |
3 | buffer contains a
line in the original source code which contains the start of a COPY
statement that is about to be expanded by the preprocessor. resp-more
contains the position of the statement on the line. |
4 | buffer contains a
line in the original source code which contains the continuation of a
COPY statement. resp-more is ignored. |
5 | buffer contains a
warning message inserted by the preprocessor. This must have the format
of a comment line (that is, the value "*" in the indicator
area of the source line). resp-more is ignored.
|
6 | An unrecoverable error has occurred; this
forces the Compiler to abort and enter the COBOL Editor. In such a case,
a message of up to 70 characters might be written to buffer
and this is displayed on the bottom line of the Editor. resp-more
is ignored. |
7 | An error has occurred; this forces the
Compiler to increment its internal error count. All error classes can be
specified by using resp-more (see the section Generating
Error Messages below). The contents of buffer
are ignored. |
8 | This value is generated by the CP preprocessor. See the section The CP Preprocessor for details. |
9 | This value is used when the CP preprocessor is in use. See the section The CP Preprocessor for details. |
10 | Identical to 11, below. |
11 | buffer contains a
new line added by the preprocessor which contains the start of a COPY
statement that is about to be expanded by the preprocessor. It is used
when the COPY statement is not unique on a line or the original text was
not a COBOL COPY statement. resp-more contains the
position of the COPY statement on the line. |
12 | buffer contains a
new line added by the preprocessor which contains the continuation of a
COPY statement. resp-more is ignored. |
13 | Causes the Compiler to return information
about its directive settings. The required directive might optionally be
placed in buffer ; resp-more
is ignored. |
14 | Not base level:This value is similar to 10 and 11 above, except that it indicates that the original source contained -INC or ++INCLUDE. |
32 | buffer contains a
line from the original source code which has not been modified by the
preprocessor. |
33 - 64 | These values are generated by the CP preprocessor. See the section The CP Preprocessor for details. |
128 | The end of a copyfile has been reached.
buffer must be empty. |
When resp-main
contains the value 1, resp-more
is used to indicate the position in the original source of the replaced
non-COBOL verb as follows:
Value
|
Description
|
---|---|
0 | No verb replacement is taking place. |
nn |
The number of the column containing the
first character of the non-COBOL verb being replaced by the current
line. The line(s) containing the non-COBOL verb would have previously
been marked by returning the value of 2 inresp-main ;
if there were more than one line, the verb is assumed to be on the first
of them.
For example, if the original source contains: exec abc do something useful end-exec and these three lines are replaced by: call abc_something_useful then the value of |
If the preprocessor encounters an error when processing the source code, it can communicate this to the Compiler so that the error is treated as a syntax error. There are two ways to do this:
resp-main
to the value 5 and place a
comment line in buffer; the comment will be inserted in the list file. resp-main
to the value 6; the Compiler will
terminate. The value in resp-more
specifies the column number
in which the error was found. It is used when positioning the cursor on
return to the Editor.
It is also possible to force the Compiler to increment its internal
error counts in conjunction with one of the two operations described
above. This is done by setting resp-main
to the
value 7 and specifying which error count is to be increased in resp-more
.
Possible values for resp-more
are:
Value
|
Severity
|
---|---|
1 | Unrecoverable error |
2 | Severe error |
3 | Error |
4 | Warning |
5 | Informational |
6 | Flag count |
Increasing the unrecoverable error count causes the Compiler to abort
immediately. The contents of buffer
are ignored.
It is the responsibility of the preprocessor to output error messages to the user before forcing the Compiler to either abort or increment the error counts. This should not be confused with the message that can be inserted in the list file which is for informational purposes only. The name specified in this COPY statement is read and stored for use by Animator. It is strongly recommended that the name is enclosed in quotes so that no unwanted case folding occurs when reading the name as, on platforms where case is significant, Animator might not be able to locate the file.
When resp-main
contains the value 13, buffer
must contain either spaces or the name of a specific directive setting
that is required. When the processor is next called, buffer
contains the value of a directive setting. When the value 13 is first
returned, the Compiler builds up a list of all of its directive settings
and returns one value from it. Subsequent returns of this value return
further values from this list. If a value other than 13 is returned at any
point, further returns of value 13 cause the list to be generated again.
The list of directives is generated in alphabetical order; once all
directive settings have been returned, further calls yield spaces. If
buffer
contains spaces the first, or next, directive
setting is returned; if buffer
contains the name of
a directive, the Compiler searches through its list to the specified
directive. Only Checker directives are returned in this way.
If the preprocessor is not interested in the contents of a copyfile, it can pass the COPY statement unmodified through to the Compiler, where it is expanded.
Only valid COBOL constructs can be passed in this way. Any non-COBOL constructs can be commented out and replaced by a valid COPY statement which the Compiler then expands.
In both of these cases, the preprocessor has no opportunity to read and process the copyfile itself.
The Compiler expands all forms of COPY statements that it would expand with no preprocessor present, whether passed back as unchanged or modified lines. The following are supported:
copy-filename
OF/IN library-name
If the preprocessor does want to examine the copyfile contents, it must either expand the copyfile itself, or use the CP preprocessor documented later in this chapter. The contents of the copyfile itself are returned to the Compiler in the same manner as the lines in the main source file, however the COPY statement itself receives special handling.
In the simplest case, when the COPY statement is the only statement on one or more lines, and specifies the full filename (including the extension and path if necessary), the first of these lines is passed to the Compiler marked with resp-main set to 3 and resp-more set to the column number of the start of the COPY, and all subsequent lines are passed with resp-main set to 4.
If the location of the file specified in the COPY statement is resolved
by adding a filename extension or path, or if the COPY statement is not
unique on the line(s), or is not a regular COBOL COPY statement, it is
necessary to comment out (mark with resp-main
set to
2) all lines containing the COPY statement and then pass through all other
lines as inserted. Inserted COPY statements that the preprocessor is
expanding must conform to the normal syntax rules for COBOL, and be marked
with resp-main
set to 11 or 14, resp-more
set to nn
(where nn
is the
position of the start of the COPY statement) for the first line and
resp-main
set to 12 for all subsequent lines.
For example, if the source contains:
01 ITEM-A. COPY "CPY-FIL.CPY".
This is first returned with resp-main
set to 2 to
indicate that this is a line that is about to be replaced. On the next
call, the preprocessor returns:
01 ITEM-A.
with resp-main
set to 1 to indicate that this is a
replacement line. On the next call, the preprocessor returns:
COPY "COPY-FIL.CPY".
This time, resp-main
is set to 11 to indicate this
is a replacement line containing the COPY statement alone. resp-more
is set to 20, the position of the word COPY on the original source line.
Note that when a COPY statement is returned to the Compiler, whether to be expanded by the preprocessor or Compiler, the Compiler parses it and checks for REPLACING. This REPLACING affects all lines in this and nested copyfiles. The name specified in this COPY statement is read and stored for use by Animator. We recommend that the name be enclosed in quotes so that no unwanted case folding occurs when reading the name as, on platforms where case is significant, Animator might not be able to locate the file.
Not base level: resp-main=14
is used when the original word was not
a COBOL COPY statements, for example ++INCLUDE or -INC. Some Compiler
directives may be specified in $SET statements before any COBOL source but
not after. Such statements may follow ++INCLUDE but not COPY. Use of
resp-main=14
allows a ++INCLUDE to be expanded while
at the same time allowing such directives to be set subsequently.
Value 128 should only be used for resp-main
when
the preprocessor has finished expanding a copyfile. If it is not, the
Compiler aborts the compilation. The value 0 should be used at the end of
the main source file.
COPY statements in Identification Division comment entries are not always
expanded. If a preprocessor signals that it is about to expand a copyfile
(resp-main
set to 3 or 11) at a point where this is
not valid, the Compiler sets response-status
to non-zero on
the next call to the preprocessor. The preprocessor should abandon the
copyfile immediately and send the end-of-copyfile marker (resp-main
set to 128) as though the copyfile were empty.
The CP preprocessor (documented later in this chapter) expands copyfiles so that other preprocessors do not have to do so. It generates lines as documented above.
A number of COBOL commands exist to modify a source file. The following statements are not supported by the Compiler when a preprocessor is active and must be handled by the preprocessor:
The CP preprocessor returns information to other preprocessors about the effect of REPLACE.
Several preprocessors can be active simultaneously on the same source program. They are arranged in a stack so that the Compiler calls the top preprocessor in the stack, this preprocessor calls the next preprocessor and so on to the preprocessor at the bottom of the stack which actually reads the source code. Each line of source is then passed through every preprocessor in turn until it reaches the top of the stack and is passed to the Compiler. In order for this to work, the preprocessor must obey some additional rules.
As described in the section Invoking a Preprocessor, the Compiler writes directives to the command line when it makes the initial call to a preprocessor. The preprocessor reads this command line and, if it finds a PREPROCESS directive, invokes the preprocessor named in it. It also passes any parameters following the directive to the invoked preprocessor by writing these in turn to the command line and continues the hand shaking process described in this section.
As preprocessor directives to the Compiler are terminated by the ENDP directive or the end of the line, multiple PREPROCESS directives must all appear on the same line in a directives file.
The interface between two preprocessors is identical to that between the Compiler and preprocessor specified above, thus it is possible to use a stackable preprocessor in both stacked and unstacked situations. A preprocessor which has not been designed to be stackable must be stacked at the end of the stack where it is directly reading the source code.
In most cases, preprocessors act on discrete sets of syntax and all produce valid COBOL syntax, so it is unlikely that more than one preprocessor will want to process a particular source line. However, it is possible that preprocessors in a stack represent language levels so that a source line is edited several times by the different preprocessors on its route through the stack to the Compiler. In this case, care must be taken to read the information in the response field so that the relationship between the source code and the code that finally reaches the Compiler is maintained. The order in which the stacking takes place should also be chosen carefully if code altered by one preprocessor is also to be successfully modified by a second preprocessor.
Any change made to source code might conflict with either the user selected COBOL dialect or data names. With care, any potential problems can easily be avoided.
If the user code conforms to a dialect that does not support features required by the preprocessor there are two issues to be overcome. Firstly, non-conforming code would be flagged; secondly the required reserved words might not be in the dictionary.
To avoid the problem of flagging, no lines that are inserted by a preprocessor are ever flagged by the Compiler.
To make all required reserved words available, a preprocessor could generate a line that sets, for example, the ANS85 directive. However, this has some unwelcome side-effects: the behavior of some statements is changed, and valid data-names are rejected as they are now reserved words instead.
A better technique is to determine which words are actually needed and selectively add only these using the ADDRSV directive. For example, if you only need the reserved word FUNCTION, you can add it using the ADDRSV(FUNCTION) directive. This, however, can still causes problems if the user code declares FUNCTION as a data name.
The best solution of all is to use the ADDSYN directive, and choose an alternative word to FUNCTION that is unlikely to be used by the user. For example, use thedirective ADDSYN "FUNCTION" = "PREPGEN--FUNCTION" and thereafter use PREPGEN--FUNCTION instead of FUNCTION when generating code.
Note: Using the ADDSYN technique might confuse any preprocessors that subsequently process the source.
If a preprocessor needs to create data for its own use, it should choose names that are unlikely to conflict with user-chosen names. As in the section above, selecting a name like PREPGEN--USERID is unlikely to conflict with a user-selected name, but USERID alone might well cause problems.
The CP preprocessor is designed to be used as a stacked preprocessor. It reads and expands source files (including copyfiles), plus it returns additional information to the other preprocessors in the stack about the effect of REPLACE and REPLACING. It thus removes these quite complex functions from other preprocessors.
The CP preprocessor has the following restrictions:
When the CP preprocessor encounters a COPY statement it marks the source lines as documented earlier in this chapter, and reads through the copyfile. It locates the copyfiles using the same pathnames and file extensions as the Compiler would.
If a copyfile cannot be located, the CP preprocessor comments out the COPY statement and generates appropriate error messages. If there is an error in parsing the statement, the preprocessor ignores it and carries on parsing tokens from the point of the error. If other preprocessors receive an unmodified COPY statement they can assume that the CP preprocessor was unable to handle it; they should leave it and let the Compiler generate an error message.
To enable other preprocessors to parse the source files as the Compiler would after the effect of REPLACE and REPLACING, the preprocessor returns additional lines as documented here. However, the Compiler still performs this REPLACING itself except as noted below. The additional lines are passed for information only.
If the preprocessor ever detects that a line is to be modified, it adds
32 to the value of resp-main
that it returns. The
line or lines thus marked are followed by one or more lines marked with
the value of 8 in resp-main
, containing what would
be in the line after it was modified. As these changes are for information
only, other preprocessors receiving the lines should pass these values
back out again for any other preprocessors in the stack; when the Compiler
receives lines marked with values in the range 33 to 64 it subtracts 32,
and it ignores lines marked with an 8.
If a preprocessor needs to modify a line affected in this way, it should do so as normal; however, it should mark the new line with a 9 rather than a 1 to inform the Compiler that this new, inserted, line does not need to be tested for the effect of REPLACE or REPLACING.
The preprocessor accepts a number of directives. In order to keep the command line short, these exist in an abbreviated form in addition to the full name. After looking at the command line in the usual way, it also looks at the environment variable CPDIR for directives.
Specifies whether to create a trace file, and optionally specifies the
name of the file to be used. If the filename is omitted, it is created as
progname.cpt
, where progname
is the basename of the main source file.
>>--.---TRACE----.-"filename"--.--.-->< | +-------------+ | +--NO--TRACE------------------+
NOTRACE
Specifies whether directives are to be shown on the screen if accepted.
>>-----.----.----CONFIRM--->< +-NO-+
NOCONFIRM
Stacks another preprocessor (that is, specifies that the source input to this preprocessor is to come from another preprocessor rather than the source file itself).
>>--.--PREPROCESS(prepnam)-.----------------.--.-->< | +-DIRECTIVE-NAME-+ | +-NO-PREPROCESS----------------------------+
NOPREPROCESS
Specifies whether warnings about directive settings are to be shown on screen
>>-----.----.----WARNING--->< +-NO-+
WARNING
Specifies whether EXEC SQL INCLUDE is to be treated as a COPY statement.
>>-----.----.----SQL------->< +-NO-+
SQL
All messages have the format:
*CP nnn-X ** description
where the variables are:
nnn | The error number | ||||||||
X | The severity level. This can be:
|
nnn values and the associated descriptions are shown below, along with causes of the error and action you can take.
The following initialization error messages might occur.
These error messages might be produced while processing source.
Copyright © 2000 MERANT International Limited. All rights reserved.
This document and the proprietary marks and names
used herein are protected by international law.
Device Handling | Library Routines |