XML PARSE Statement

The XML PARSE statement parses an XML document so that it can be processed by the COBOL program. It is an implementation of the IBM Enterprise COBOL verb of the same name and is provided to simplify IBM migrations; however, any customer wishing to read XML data can use this verb.

XML PARSE is similar to C$XML in that you parse (read) the XML data and move it into the appropriate working storage item. The difference is that with C$XML, if you know that the data lies in a certain element or attribute, you can retrieve that attribute directly. With XML PARSE, you set up a processing procedure so that when you encounter a new element or attribute, you can specify how and where you want to store that data.

See Working with Non-Vision Data in A Guide to Interoperating with ACUCOBOL-GT for additional information on working with XML data.

General Format

XML PARSE identifier-1 PROCESSING PROCEDURE [IS]
  procedure-name-1 [THROUGH procedure-name-2]
                    THRU
  [[ON] EXCEPTION imperative-statement-1]
  [NOT [ON] EXCEPTION imperative-statement-2]
  [END-XML]

Syntax Rules

  1. identifier-1 is an alphanumeric data item that contains the XML document character stream.
  2. The PROCESSING PROCEDURE phrase specifies the name of a procedure to handle the various events that the XML parser generates.
  3. procedure-name-1, procedure-name-2 names a section or paragraph in the procedure division. Procedure-name-1 and procedure-name-2 must not name a procedure name in a declarative section.
  4. procedure-name-1 specifies the first (or only) section or paragraph in the processing procedure.
  5. procedure-name-2 specifies the last section or paragraph in the processing procedure.

General Rules

  1. For each XML event, the parser transfers control to the first statement of the procedure named procedure-name-1. Control is always returned from the processing procedure to the XML parser. The point from which control is returned is determined as follows:
    • If procedure-name-1 is a paragraph name and procedure-name-2 is not specified, the return is made after the last statement of the procedure-name-1 paragraph is executed.
    • If procedure-name-1 is a section name and procedure-name-2 is not specified, the return is made after the last statement of the last paragraph in the procedure-name-1 section is executed.
    • If procedure-name-2 is specified and it is a paragraph name, the return is made after the last statement of the procedure-name-2 paragraph is executed.
    • If procedure-name-2 is specified and it is a section name, the return is made after the last statement of the last paragraph in the procedure-name-2 section is executed.

    This procedure is the same as if the COBOL program executed the PERFORM verb on the same paragraph(s).

  2. procedure-name-1 and procedure-name-2 must define a consecutive sequence of operations to execute, beginning at the procedure named by procedure-name-1 and ending with the execution of the procedure named by procedure-name-2.

    If there are two or more logical paths to the return point, then procedure-name-2 can name a paragraph that consists of only an EXIT statement; all the paths to the return point must then lead to this paragraph.

  3. The processing procedure consists of all the statements at which XML events are handled. The range of the processing procedure includes all statements executed by CALL, EXIT, GO TO, GOBACK, MERGE, PERFORM, and SORT statements that are in the range of the processing procedure, as well as all statements in declarative procedures that are executed as a result of the execution of statements in the range of the processing procedure.

    The range of the processing procedure must not cause any GOBACK or EXIT PROGRAM statement to be executed, except to return control from a program to which control was passed by a CALL statement that is executed in the range of the processing procedure.

    The range of the processing procedure must not cause an XML PARSE statement to be executed, unless the XML PARSE statement is executed in an outermost program to which control was passed by a CALL statement that is executed in the range of the processing procedure.

    A program executing on multiple threads can execute the same XML statement or different XML statements simultaneously. However, the compiler generates LOCK THREAD / UNLOCK THREAD statements immediately before or after the XML PARSE statement, so effectively only a single thread is executing during the entire execution of the XML PARSE.

    The processing procedure can terminate the run unit with a STOP RUN statement.

    For more details about the processing procedure, see Control Flow.

  4. ON EXCEPTION phrase. The ON EXCEPTION phrase specifies imperative statements to be executed when an exception condition is raised by XML PARSE.

    An exception condition exists when the XML parser detects an error while processing an XML document. The parser first signals the exception by passing control to the processing procedure with special register XML-EVENT containing the word, 'EXCEPTION'. The parser also provides a numeric error code in special register XML-CODE. Error codes are listed in the special register section.

    An exception condition also exists when the processing procedure sets XML-CODE to -1 before returning to the parser for a normal XML event. This is done by the user to deliberately terminate parsing. In this case, the parser does not signal an XML exception event. If the ON EXCEPTION phrase is specified, control is transferred to imperative-statement-1. If it is not specified, NOT ON EXCEPTION phrases are ignored, and control is transferred to the end of the XML PARSE statement. Special register XML-CODE contains the numeric error code for the XML exception or -1 after execution of the XML PARSE statement.

    If the processing procedure handles the XML exception event and sets XML-CODE to zero before returning control to the parser, the exception condition no longer exists. If no other unhandled exceptions occur prior to the termination of the parser, control is transferred to imperative-statement-2 of the NOT ON EXCEPTION phrase, if specified.

  5. NOT ON EXCEPTION phrase. The NOT ON EXCEPTION phrase specifies imperative statements to be executed when no exception conditions exist at the conclusion of XML PARSE processing.

    When no exception conditions exist, control is transferred to imperative-statement-2, if specified, or to the end of the XML PARSE statement. If an ON EXCEPTION phrase is specified, it is ignored. Special register XML-CODE contains a zero after the XML PARSE statement has finished executing.

  6. END-XML phrase. The END-XML phrase is an explicit scope terminator that delimits the scope of both XML GENERATE and XML PARSE statements. With END-XML, conditional XML GENERATE or XML PARSE statements can be nested in other conditional statements. Conditional XML GENERATE or XML PARSE statements specify the ON EXCEPTION or NOT ON EXCEPTION phrase.

    The scope of a conditional XML PARSE statement is terminated by:

    • An END-XML phrase at the same level of nesting
    • A separator period
  7. Parsing XML documents one segment at a time. You can parse XML documents by passing one segment (or record) of XML text at a time. Processing very large documents, or processing XML documents that reside in a file, are two possible major applications.

    To use this feature, compile your program with the XMLPARSE(XMLSS) compiler option in effect.

    One can parse an XML document a segment at a time by initializing the parse data item to the first segment of the XML document, and then executing the XML PARSE statement. The parser processes the XML text and returns XML events to your processing procedure as usual.

    At the end of the text segment, the parser signals an END-OF-INPUT XML event, with XML-CODE set to zero. If there is another segment of the document to process, in your processing procedure move the next segment of XML data to the parse data item, set XML-CODE to one, and return to the parser. To signal the end of XML segments to the parser, return to the parser with XML-CODE still set to zero.

    The length of the parse data item is evaluated for each segment, and determines the segment length.

  8. Variable-length segments. If the XML document segments are variable length, specify a variable-length item for the parse data item. For example, for variable-length XML segments, you can define the parse data item as one of the following items:

    • A variable-length group item that contains an OCCURS DEPENDING ON clause.
    • A reference-modified item.
    • An FD record that specifies the RECORD IS VARYING DEPENDING ON clause, where the depending-on data item is used as the length in a reference modifier or ODO object for the FD record.

Nested XML PARSE Statements

When a given XML PARSE statement appears as imperative-statement-1 or imperative-statement-2, or as part of imperative-statement-1 or imperative-statement-2 of another XML PARSE statement, that given XML PARSE statement is a nested XML PARSE statement.

Nested XML PARSE statements are considered to be matched XML PARSE and END-XML combinations proceeding from left to right. For this reason, when END-XML phrases are encountered, they are matched with the nearest preceding XML PARSE statements that have not already been terminated.

Control Flow

When the XML parser receives control from an XML PARSE statement, it analyzes the XML document and transfers control to procedure-name-1 at the following points:

  • At the start of the parsing process
  • When a document fragment is found
  • When the parser detects an error in parsing the XML document
  • At the end of processing the XML document

Control returns to the XML parser when the end of the processing procedure is reached.

The exchange of control between the parser and the processing procedure continues until either:

  • The entire XML document has been parsed, ending with the END-OF-DOCUMENT event.
  • The parser detects an exception and the processing procedure does not reset special register XML-CODE to zero prior to returning to the parser.
  • The processing procedure terminates parsing deliberately by setting XML-CODE to -1 prior to returning to the parser.

Then, the parser terminates and returns control to the XML PARSE statement with the XML-CODE special register containing the most recent value set by the parser or the processing procedure.

The XML-CODE, XML-EVENT, and XML-TEXT special registers contain information about each XML event passed to the processing procedure. The content of XML-CODE is defined during and after execution of an XML PARSE statement. The contents of all other XML special registers are undefined outside the range of the processing procedure.

For normal XML events, XML-CODE contains zero when the processing procedure receives control. For exception events, XML-CODE contains one of the exception codes specified later in this document. XML-EVENT is set to the event name, such as START-OF-DOCUMENT. XML-TEXT contains the piece of the document corresponding to the event, as described in XML-EVENT. For more information about the XML special registers, see "Special Registers" below.

For all kinds of XML events, if XML-CODE is not zero when the processing procedure returns control to the parser, the parser terminates without a further EXCEPTION event. Setting XML-CODE to -1 before returning to the parser for an event other than EXCEPTION forces the parser to terminate with a user-initiated exception condition. For some EXCEPTION events, the processing procedure can handle the event, then set XML-CODE to zero to force the parser to continue, although subsequent results are unpredictable. When XML-CODE is zero, parsing continues until the entire XML document has been parsed or an exception condition occurs.

Special Registers

XML-CODE

When used in the XML PARSE statement, the XML-CODE special register is used to communicate status between the XML parser and the processing procedure.

For each event, the XML parser sets XML-CODE before transferring control to the processing procedure. It also does this at parser termination. You can reset XML-CODE before returning control to the parser.

The XML-CODE special register has the implicit definition:

01  XML-CODE PICTURE S9(9) USAGE BINARY VALUE 0.

When the XML parser encounters an XML event, it sets XML-CODE and then passes control to the processing procedure. For all events except EXCEPTION, XML-CODE contains zero when the processing procedure receives control.

For an EXCEPTION event, the parser sets XML-CODE to an exception code that indicates the nature of the exception. Exception codes are listed below. Note that these are different than IBM COBOL's exception codes.

XML PARSE Exception Code Description
101 Out of memory
102 Syntax error in XML
103 No elements
104 Invalid token
105 Unclosed token
106 Partial character
107 Tag mismatch
108 Duplicate attribute
109 Junk after the doc element
110 Error in the parameter entity reference
111 Undefined entity
112 Recursive entity reference
113 Asynchronous entity
114 Bad character reference
115 Binary entity reference
116 Attribute external entity reference
117 Misplaced XML processing instructions
118 Unknown encoding
119 Incorrect encoding
120 Unclosed cdata section
121 External entity handling required
122 Not standalone
123 unexpected error
124 entity declared in wrong place

If you want the parser to terminate after normal events without causing an EXCEPTION, set XML-CODE to -1 before returning control to the parser. If you set XML-CODE to any other value, results are undefined. IBM customers should note that ACUCOBOL-GT ignores XML-CODEs of 0. This is because unlike the IBM COBOL parser, there are no exceptions that allow continuation of parsing in ACUCOBOL-GT. Our XML parser cannot continue once it has detected an error.

In ACUCOBOL-GT, no further events are returned from the parser. Control is passed to the statement that you specify in the ON EXCEPTION phrase, or to the end of the XML PARSE statement if you did not code an ON EXCEPTION phrase.

When the parser returns control to the XML PARSE statement, XML-CODE contains the most recent value set either by the parser or by the processing procedure.

XML-EVENT

The XML parser uses the XML-EVENT special register to communicate event information to the processing procedure. The information that is communicated is identified in the XML PARSE statement. Before passing control to the processing procedure, the XML parser sets XML-EVENT to the name of the XML event, as described in Table 1 at the end of this topic.

XML-EVENT has the implicit definition:

01  XML-EVENT USAGE DISPLAY PICTURE X(30) VALUE SPACE.

XML-EVENT cannot be used as a receiving data item.

XML-TEXT

The XML-TEXT special register is defined during XML parsing to contain document fragments that are of class alphanumeric. XML-TEXT is an elementary alphanumeric data item of the length of the contained XML document fragment. The length of XML-TEXT can vary from 0 through 16,777,215 bytes. There is no equivalent COBOL data description entry.

The parser sets XML-TEXT to the document fragment associated with an event before transferring control to the processing procedure when the operand of the XML PARSE statement is an alphanumeric data item.

Use the LENGTH function for XML-TEXT to determine the number of bytes that XML-TEXT contains.

XML-TEXT cannot be used as a receiving item.

XML event (content of XML-EVENT) Content of XML-TEXT
ATTRIBUTE-CHARACTERS The value within quotes or apostrophes. If the value includes an entity reference, this can be a substring of the attribute value.
ATTRIBUTE-NAME The attribute name; the string to the left of "=".
COMMENT The text of the comment between the opening character sequence "<!--" and the closing character sequence "-->".
CONTENT-CHARACTER The single character corresponding with the predefined entity reference in the element content.
CONTENT-CHARACTERS The element content between start and end tags. This can be a substring of the element content if the content contains an entity reference or another element.
DOCUMENT-TYPE-DECLARATION The entire document type declaration including the opening and closing character sequences, "<!DOCTYPE" and ">".
ENCODING-DECLARATION The value, between quotes or apostrophes, of the encoding declaration in the XML declaration.
END-OF-CDATA-SECTION Always contains the string "]]>".
END-OF-DOCUMENT Null, zero-length.
END-OF-ELEMENT The name of the end element tag or empty element tag.
EXCEPTION The part of the document successfully scanned, up to and including the point at which the exception was detected. Special register XML-CODE contains the unique error code identifying the exception.
PROCESSING-INSTRUCTION-DATA The rest of the processing instruction, not including the closing sequence, "?>", but including trailing, not leading, white space characters.
PROCESSING-INSTRUCTION-TARGET The processing instruction target name that occurs immediately after the processing instruction opening sequence, "<?".
STANDALONE-DECLARATION The value between quotes or apostrophes of the stand-alone declaration in the XML declaration
START-OF-CDATA-SECTION Always contains the string "<![CDATA[".
START-OF-DOCUMENT The entire document.
START-OF-ELEMENT The name of the start element tag or empty element tag, also known as the element type.
VERSION-INFORMATION The value between quotes or apostrophes of the version declaration in the XML declaration.