The following chart provides a summary of the ingestion process.
Documents are submitted to Connector Framework Server through the ingest
action. If the document has metadata only, CFS runs any processing tasks that have been configured and the document is then ready for indexing. If the document has an associated file then the ingestion process depends on the file format.
XML files. Many systems export information in XML format and CFS has features to help you convert XML into IDOL documents.
CFS can run a transformation on an ingested XML file. This is an optional step but can be useful in cases where your XML files do not resemble IDOL documents or you are processing XML from many sources and the files have different schemas. You can configure any number of transformations and CFS runs the first transformation where the ingested XML matches the specified schema. You can also configure a default transformation that CFS runs when an XML file does not match any of your schemas. When a transformation is configured but is not successful, CFS adds the document to the import queue so that the XML is processed by KeyView.
After an XML transformation is successful or when transformation is not configured, CFS attempts to convert the XML into IDOL documents. The conversion is performed by mapping elements in the XML to IDOL documents and document fields. If the conversion is successful the resulting documents are returned to the ingest queue as metadata-only documents. If the conversion does not result in any IDOL documents but the XML was transformed after matching a schema, CFS does not consider this as a failure and does not index any documents. Otherwise, CFS adds the document to the import queue so that the XML is processed by KeyView.
Parsing an XML file is usually preferable to processing it with KeyView, because although KeyView can extract the text it does not preserve the structure information (the XML tags are discarded).
|