Document Tracking

Document tracking is a feature present in IDOL components that are involved in indexing. It reports upon the progress of documents as they pass through an index chain. Every time a document reaches a certain stage in the indexing process, the component commits event data to a back end, which stores the events.

The back end can be a log file, or an SQL database (this option requires an additional library, which is available in the IDOL Server installers). You retrieve the events by using an appropriate interface for your chosen back end, for example an SQL client for a database.

Uses

You can use document tracking to detect problems with the indexing process.

Document tracking might flag when a batch of documents is rejected because there are misconfigured fields, or when an existing document is deleted.

The stored event data includes the references of documents that you index, and other metadata, so that you can quickly find the exact source of problems. Similarly, the data can point to issues with individual components in the indexing system.

If one Content component commits event data at a lower rate than expected, it might indicate a bottleneck in your indexing process in or near that component.

Document tracking provides an overview of what happens to a document as it moves through the index process. It is useful for troubleshooting.

You might expect a document to pass through a certain indexing stage, but the tracking data shows that it did not reach this stage. This might indicate that there is an issue with the missing stage, or that the document somehow does not qualify for it. In either case, you can use this information to diagnose and solve the problem.

Operation

For full details of how to configure document tracking, refer to the IDOL Server Administration Guide and IDOL Server Reference.

The document tracking components commit event data to the back end in batches for efficiency. The components write data to an intermediate log file, which it stores in the directory specified by the DocTrackDir configuration parameter. The component periodically processes this file, where the frequency of processing depends on the values of the TimeoutSeconds and MaxEventsPerFile configuration parameters. When either the specified period elapses, or the file has the specified maximum number of events, the component commits that data to the back end.

Generally, set TimeoutSeconds to a small value, so that the back end is an up-to-date model of the document status in your system, and set MaxEventsPerFile to a large value, because sending larger batches to the back end is more efficient when the system is under heavy indexing load.

CAUTION:

Do not modify or delete files in the DocTrackDir during operation. You can clear the directory during downtime if you want to discard events.

Each component must have its own DocTrackDir, which must not be shared.

CAUTION:

In a unified IDOL Server configuration under the IDOL Proxy component, the DocTrackDir might be shared by multiple components. In this case, use a relative path for the DocTrackDirto create a subdirectory, so that each component creates its own directory. Do not use a relative path that traverses back up the file tree.

If the document tracking component cannot contact the back-end server, the component keeps the events and reattempts committing the file.

You can optionally configure the maximum size of the DocTrackDir, by using the DocTrackDirMaxSizeKB configuration parameter. In most cases, the default value is appropriate, and Micro Focus recommends using a large value because the component discards events when the directory reaches the maximum size.

Data Generation

Each indexing component generates different events.

A Content component creates an Indexed event when the document is indexed, and a Committed event when the document is available for querying.

Most connectors generate events that describe the creation of tasks. The HTTP and File System Connectors create an Added event when it finds a new file or Web page for indexing and creates a document to represent it. They create an Updated event when a file or Web page has changed (which the Content component will eventually process as a DREREPLACE index action).

CFS creates events representing the continuations of events that other connectors generate. It creates an Import:Queue event when a document reaches CFS and is in the index queue. It creates an Update Received event when a connector sends an update operation, which CFS converts to a DREREPLACE index action to send to the Content component.

Each event also has a source string, which has the format COMPONENTNAME_IPADDRESS_SERVICEPORTNUMBER.

A Content component might have the source string content_10.2.106.5_5502.

This source string allows you to easily see the workflow for each document as it passes through the indexing system. Each component has a unique source string, so you can quickly identify any problems with a component, even in a complex indexing setup.

You can use the IndexUID parameter in DREADD and DREADDDATA index actions to track the progress of a batch of documents. This parameter adds the value you specify as an identification tag to each document in the batch. You can query the back end for this tag to track the progress of each document in the batch.

Considerations

When you build and configure your document tracking solution, consider the following factors:

 


_FT_HTML5_bannerTitle.htm