Previous Topic Next topic Print topic


SAX event structure descriptions for PLISAXC

The event structure has 19 ENTRY variables. These variables point to functions invoked by the parser for various events.

Descriptions of each event in this topic refer to this XML document example.
xmlDocument =
   '<?xml version="1.0" standalone="yes"?>'
|| '<!--This document is just an example-->'
|| '<sandwich>'
|| '<bread type="baker&quot;s best"/>'
|| '<?spread please use real mayonnaise ?>'
|| '<meat>Ham &amp; turkey</meat>'
|| '<filling>Cheese, lettuce, tomato, etc.</filling>'
|| '<![CDATA[We should add a <relish> element in future!]]>'
|| '</sandwich>'
|| 'junk';

The term XML text in the descriptions is the string, which is formed based on the pointer and length passed to the event. The parser may recognize these events in their order of appearance in the structure.

start_of_document
Occurs once, at the beginning of parsing the document. The parser passes no parameters to this event (except the user token).
version_information
Occurs within the optional XML declaration for the version information. The parser passes the address and length of the text containing the version value ("1.0" in the example above).
encoding_declaration
Occurs within the XML declaration for the optional encoding declaration. The parser passes the address and length of the text containing the encoding value.
standalone_declaration
Occurs within the XML declaration for the optional standalone declaration. The parser passes the address and length of the text containing the standalone value ("yes" in the example).
document_type_declaration
Occurs when the parser finds the document type declaration. A document type declaration begins with the character sequence <!DOCTYPE" and ends with a > character. Fairly complicated grammar rules describe the content in between.

The parser passes the address and length of the text containing the entire declaration. This includes the opening and closing character sequences. This is the only event where XML text includes the delimiters. The example above does not have a document type declaration.

end_of_document
Occurs once, when document parsing has completed. The parser passes no parameters to this event (except the user token).
start_of_element
Occurs once for each element start tag or empty element tag. The parser passes the address and length of the text containing the element name as well as any applicable namespace information. The first start_of_element event during parsing in the example contains the string sandwich.
attribute_name
Occurs for each attribute in an element start tag or empty element tag, after recognizing a valid name. The parser passes the address and length of the text containing the attribute name as well as any applicable namespace information. The only attribute name in the example is type.
attribute_characters

Occurs for each fragment of an attribute value. The parser passes the address and length of the text containing the fragment. An attribute value normally consists of a single string only, even if it is split across lines:

<element attribute="This attribute value is
split across two lines"/>
end_of_element
Occurs once for each element end tag or empty element tag whenever the parser recognizes a closing angle bracket of the tag. The parser passes the address and length of the text containing the element name as well as any applicable namespace information.
start_of_CDATA_section
Occurs at the start of a CDATA section. CDATA sections begin with the string <![CDATA[ and end with the string ]]. They are are used to escape blocks of text containing characters that would otherwise be recognized as XML markup. The parser passes no parameters to this event (except the user token).

After this event, the parser passes the content of a CDATA section between these delimiters as as one or more content-characters events. In the preceding example, the content-characters event is passed the text We should add a <relish> element in future!

end_of_CDATA_section
This event occurs when the parser recognizes the end of a CDATA section. The parser passes no parameters to this event (except the user token).
content_characters

This event represents the main body of an XML document. This is the character data between element start and end tags. The parser passes the address and length of the text containing this data, which usually consists of a single string only, even if it is split across lines:

<element1>This character content is
split across two lines"</element1>

The parser also passes a flag byte which indicates if the next event provides additional characters that form part of the content. This can be true when there is a lot of data between the start and end tags.

The parser also uses the content_characters event to pass the text of CDATA sections to the application.

processing_instruction
Processing instructions (PIs) allow XML documents to contain special instructions for applications. This event occurs when the parser recognizes the name following the PI opening character sequence, <?. The event also covers the data following the PI target, up to but not including the PI closing character sequence, ?>. Trailing, but not leading white space characters in the data are included. The parser passes the address and length of the text containing the target, spread in the example, and the address and length of the text containing the data, please use real mayonnaise in the example.
comment
Occurs for any comments in the XML document. The parser passes the address and length of the text between the opening and closing comment delimiters, <!-- and -->, respectively. The only comment text in the example is This document is just an example.
namespace_declare
Occurs for any namespace declarations in the XML document. The parser passes the address and length of the namespace prefix (if any) as well as the address and length of the namespace URI. If there is no namespace prefix, the passed length is zero and the value of the address should not be used. There is no corresponding event in the PLIXSAXA and PLISAXB built-ins.
end_of_input
This event occurs whenever the parser reaches the end of the current input buffer. The parser passes (along with the BYVALUE user token) two BYADDR parameters: the address and length of the next buffer for it to process. This event and the content character events are the only events that have any BYADDR parameters, but this is the only event that has parameters that the called event should change. There is no corresponding event in the PLIXSAXA and PLISAXB built-ins, and it is this event that allows PLISAXC to parse an XML document of arbitrary size.
unresolved_reference
This event occurs for any unresolved references in the XML document. The parser passes the address and length of the unresolved reference.
unknown_attribute_reference
Occurs within attribute values for entity references other than the pre-defined entity references, listed for the event attribute_predefined_reference. The parser passes the address and length of the text containing the entity name.
exception
The parser generates this event when it detects an error processing the XML document.
Previous Topic Next topic Print topic