Index Cache

The index cache is a portion of the machine's memory allocated to caching the data of distinct terms sent to the Content component during index actions. Using the index cache speeds up indexing, because writing to memory is quicker than writing to disk. However, IDOL Server must commit the cache to disk to permanently save the data, and for the content to become searchable.

Set the Size of the IndexCache

You can set the size of the index cache by changing the IndexCacheMaxSize configuration parameter in the [IndexCache] section of the IDOL Server configuration file. This parameter is the maximum amount of data in KB that IDOL Server stores in the index cache before flushing the data to disk. IDOL Server allocates this amount of memory on startup. The default value is 102400.

NOTE:

The value of IndexCacheMaxSize is capped to 2 GB on a 32-bit machine.

As an alternative to setting the size of the index cache in the configuration file, you can resize the cache on the Caches tab of the Status page in IDOL Admin.

Choose a Size

When the index cache is full, it is flushed to disk, and so the overall indexing speed decreases as the number of disk flushes increases. Therefore, if your index cache size is too small, the indexing speed is reduced, because of the increased number of disk writes.

However, because all the IndexCacheMaxSize memory is allocated on startup, if your index cache size is too large it might result in memory allocation errors. These errors might occur in Content, or in other processes running on the same machine. Allocating too much of the system memory in one place can also increase the time it takes to make memory allocations elsewhere in the system, which can slow down processing for Content and other applications.

When setting the index cache size, you must balance memory usage and indexing speed. To find the optimum value, it is very useful to have information about the amount of data you want to index, the desired data rate of indexing, and the machine that you run IDOL Server on.

On a 64-bit system, each term has 72 + (4 * (number of documents term appears in + number of total occurrences)) bytes of data stored in the index cache.

The word giraffe appears twice in one document and once in another document - so it appears in two documents for a total number of three times. The index cache space used is then 72 + (4 * (2 + 3)) = 92 bytes.

NOTE:

This value represents a lower bound on memory usage because of how IDOL Server allocates memory from the index cache for the growing part of the term data (that is, the total occurrences and number of document occurrences). When this part requires more memory, it gets twice as much as it already has from the index cache. In a hypothetical example, if this portion of memory is currently at 16 bytes, but adding the next occurrence requires 20 bytes, it gets 32 bytes from the cache. All 32 bytes is marked as taken from the index cache, even though only 20 is currently used. The advantage of this method is that there are fewer memory allocations from the cache, because each allocation is relatively slow.

A simple analysis of 50,000 terms randomly sampled from a development Content server indexing news data, with around 500,000 documents, gives average figures of 169 document occurrences and 302 total occurrences for these terms. This data gives an average index cache usage of 1,956 bytes per term. A 100 MB index cache can hold (at most) around 53,600 terms like these before requiring a flush.

NOTE:

The scaling factor of index cache usage for each term is higher for certain settings. For example, if AdvancedPlus is set to True, the upper bound on the disk usage is: 72 + (4 * (number of documents term appears in + (4 * number of total occurrences))).

NOTE:

The MaxIndexTermsPerDocument configuration parameter might also affect your choice of cache size. This parameter sets the maximum number of distinct terms to index on any document section. By default there is no limit.

Dynamically Change the Size of the IndexCache

You can change the size of the index cache dynamically while Content is running, by using the DRERESIZEINDEXCACHE index action. This option allows you to tune your approach to sizing the index cache. For example, you might allocate a small index cache for normal usage (such as querying and light indexing). When you have a large indexing job scheduled, you can then increase the size of the index cache to speed up that job. After the job is finished, you can release the extra index cache space back to the system.

CAUTION:

When resizing the index cache to use more space, be mindful of the memory requirements of other applications running (or due to run) on the machine.

Index Cache Flush to Disk

When the index cache starts flushing to disk during a DREADD type index action, IDOL writes the message Index cache is full. Merging with disk. to the index log. After the flush is complete, it writes Merging with disk completed. to the log. If your indexing performance is slow, and these messages occur frequently in your index log, then it is likely you can improve performance by increasing the size of the index cache.

At other times, IDOL logs alternative index log messages:

Search for Flushed Content

Terms flushed from the index cache during an index action do not become searchable immediately. You can only search for the terms after the index syncs. When DelayedSync is False, the index sync occurs when the index action finishes. Otherwise, either the MaxSyncDelay time period must pass, or you must send a DRESYNC index action.


_FT_HTML5_bannerTitle.htm