IngestAsPlainText

Web Connector creates an IDOL document for each web page that you have chosen to ingest. A document is ingested with a HTML file that represents the web page after it has been processed (the connector can clip irrelevant content, remove scripts, convert hyperlinks to absolute URLs, and so on). Connector Framework Server (CFS) populates the DRECONTENT field of each document by extracting text from its associated file.

Alternatively, setting IngestAsPlainText=TRUE configures the Web Connector to ingest content as plain text. In this case Web Connector downloads each page you have chosen to ingest, processes it (performing clipping, removing scripts, and so on), and then extracts the text. The text is added to the DRECONTENT document field. The connector sends metadata-only documents to CFS (the documents do not have associated HTML files).

Using the connector to extract text is likely to produce better results than CFS (KeyView), because the connector processes HTML using the HTML Document Object Model (DOM).

Type: Boolean
Default: False
Required: No
Configuration Section: TaskName or FetchTasks or Default
Example: IngestAsPlainText=TRUE
See Also: