In the previous cases, you saw how structured COBOL data could be coded as an XML document. In this section, you will examine how an arbitrary XML document can be represented as a COBOL structure. This requires that you look at some other aspects of the XML information model that are not needed to represent COBOL structures, but might be present in XML, nonetheless.
So far, you have seen that XML has elements and text. Although, these are the primary means of representing data in XML documents, there are some other ways of representing and structuring data in XML. Suppose you have the following XML document:
<contact type="student"> <firstname>Betty</firstname> <lastname>Smith</lastname> <address form="US"> <streetaddresses> <streetaddress>Knox College</streetaddress> <streetaddress>Campus Box 9999</streetaddress> <streetaddress>2 E. South St.</streetaddress> </streetaddresses> <city>Galesburg</city> <state>IL</state> <postalcode zipplus4="N">61401</postalcode> </address> <email>firstname.lastname@example.org</email> </contact>
In the example document shown here is now a new kind of data, known as an "attribute" in XML. Notice that the <contact> element tag has what appears to be some kind of parameter named "type." This is, in fact, an attribute whose value is set to the text string "student." In XML, attributes are another way of coding element content, but in a way that does not affect the text content of the element itself. In other words, attributes are "out-of-band" data associated with an element. This concept has no parallel in standard COBOL. In COBOL, all data associated with a data item is part of the COBOL record content. This means that if you are to capture all of the content of an XML document, you must have a way to capture and store attributes.
You do this with the help of an important XML tool called an external XSLT stylesheet. For now, assume that an XSLT stylesheet can transform an XML document into any desired alternative XML document. If this is true (and it is), you must code the incoming attributes as something that has a direct COBOL counterpart. This would be as a data item represented as a text element in XML.
The example document, after external XSLT stylesheet transformation, might look like this:
<contact> <email>email@example.com</email> <attr-type>student</attr-type> <firstname>Betty</firstname> <lastname>Smith</lastname> <address> <attr-form>US</attr-form> <city>Galesburg</city> <state>IL</state> <postalcodegroup> <attr-zipplus4>N</attr-zipplus4> <postalcode>61401</postalcode> </postalcodegroup> <streetaddresslines>3</streetaddresslines> <streetaddresses> <streetaddress>Knox College</streetaddress> <streetaddress>Campus Box 9999</streetaddress> <streetaddress>2 E. South St.</streetaddress> </streetaddresses> </address> </contact>
Several things have been changed. The attributes have been turned into elements, but with a special name prefixed by "attr-" and a new element, <streetaddresslines> has been added containing a count of the number of <streetaddress> elements. In the case of <postalcode>, a new element has been added to wrap both the real <postalcode> value, and the new attribute. All of these changes are very easy to make using a simple XSLT stylesheet, and you now have a document with a direct equivalent in COBOL:
01 contact. 10 email pic x(20). 10 attr-type pic x(7). 10 firstname pic x(10). 10 lastname pic x(10). 10 address. 20 city pic x(20). 20 state pic x(2). 20 postalcodegroup. 30 attr-zipplus4 pic x. 30 postalcode pic 9(5). 20 attr-form pic xx. 20 streetaddresslines pic 9. 20 streetaddresses. 30 streetaddress occurs 1 to 9 times depending on streetaddresslines pic x(20).