Connector Framework Server (CFS) processes the information that is retrieved by connectors, and then indexes the information into IDOL Server
CFS reads the content and metadata from files and records that are retrieved by connectors, and writes this information to documents for indexing into IDOL Server. When connectors send documents to CFS, they contain only metadata extracted from the repository, such as the location of a file or record that the connector has retrieved. CFS extracts the file-specific metadata and file content from the file and adds it to the document. This allows IDOL Server to search and extract meaning from the information contained in the repository, without needing to process the information in its native format.
CFS also provides features to manipulate and enrich documents before they are indexed. This means that you can manipulate the data that is indexed into IDOL Server, and improve the quality of the information. CFS includes customizable import tasks that you can run, and supports the Lua scripting language so that you can write your own tasks and develop custom processing rules. For example, you can manipulate the fields and field values in each document.
A single CFS can process information from any number of Connectors. For example, a CFS might process files retrieved by a File System Connector, web pages retrieved by an HTTP connector, and e-mail messages retrieved by an Exchange Connector.
CFS uses KeyView to extract meaningful information from the files or records retrieved by connectors. KeyView can extract the file content, metadata, and subfiles from over 1,000 different file types.
File content is the main content of a file, for example the body of an e-mail message.
Metadata is information about a file itself, for example the sender of an e-mail message or the date and time when it was received.
Subfiles are files that are contained with the main file. For example, an e-mail message might contain embedded images or attachments that you want to index.
CFS can run Field Standardization, which renames document fields so that they follow a standard naming scheme. You can use field standardization so that documents indexed into IDOL from different connectors use the same fields to store the same type of information. Your CFS installation includes a file named dictionary.xml
, which lists the fields renamed during field standardization, and the standardized names.
CFS provides features to manipulate and enrich the documents that are indexed into IDOL Server. This means that you can add additional information to the documents, and improve the quality of the information, before the documents are indexed into IDOL Server. You can manipulate documents with Import Tasks (predefined processing tasks that are provided by CFS), and with Lua scripts.
You can use CFS Import Tasks to:
Add additional fields or manipulate existing fields.
Write a copy of documents to an IDX or XML file on disk. This allows you to confirm that files are being processed as expected, and identify whether additional fields or data needs to be added to the documents before they are indexed.
Perform Eduction on document fields.
Perform Optical Character Recognition (OCR) on images and add the text to the document content.
Extract speech from audio and video, and add the transcription to the document content.
Extract content from HTML pages, discarding irrelevant content such as headers, sidebars, advertisements, and scripts.
Split long documents into multiple sections.
Check that documents contain content in a specific language, and discard those that contain binary or symbolic content.
After CFS finishes processing documents, it can automatically index them into an IDOL Server, or send them to a Distributed Index Handler (DIH) so that they can be distributed across multiple IDOL servers.
|