The following image shows the completed ingestion pipeline that is described in the following sections.
The pipeline includes the following steps:
Input Port. The input port receives data, in the form of NiFi FlowFiles, from an IDOL Connector. For example, if you use a File System Connector, the connector sends a FlowFile to represent each file that is retrieved from the file system. Depending on how the connector is configured, each FlowFile contains either the path to the associated file or the binary content.
KeyView Extraction. Extracts files from containers. For example, if a FlowFile represents a zip archive, KeyView extracts the contents of the archive.
KeyView Filtering. Filtering extracts the text from a file and adds it to the document content. The text can then be indexed into IDOL, which means that IDOL does not need to process the data in its original format.
Remove Document Part. This step removes the binary content or file reference from a FlowFile. Removing file references allows NiFi to delete temporary files.
Indexing. Documents are indexed into an IDOL Content component.