Open topic with navigation
Although IDOL Server does not have a database schema, it is possible to optimize the storage of certain types of field. Depending on the types of fields that you configure, you will observe different files and folders in your IDOL directory.
The IDOL Server index divides into several subindexes, which store content from different types of field. The following sections describe the different subindexes, and the memory requirements and impact of the different indexes.
IndexTemp directory) stores an intermediate form of information used to build up the IDOL
dynterm index. The
IndexCache structure is non-persistent (but the data is eventually converted into its persistent form when IDOL Server flushes the
IndexCache to generate the
dynterm structure). The
IndexTemp directory only contains data for the duration of an index flush.
IndexCache memory usage is configured by
IndexCacheMaxSize configuration parameter. Increasing the size of the index cache means that IDOL Server can process more documents before it must flush to disk.
The index cache stores the unique terms that IDOL has processed since the last flush, plus the new document IDs, positions, and other occurrence information for those terms.
dynterm is the persistent version of the
IndexCache information, storing all the indexed terms and their document occurrence information. It consists primarily of a dictionary (the unique terms and metadata information about each term) and postings (the document IDs that the terms occur in, plus information on the occurrences of those terms).
Each unique term uses a fixed amount of disk space in the dictionary file (the
record_size in bytes as reported by the
GetStatus action, and controlled by the
TermSize configuration parameter). The disk space usage for the postings information is (depending on configuration) between 4-16 bytes per occurrence of a term.
dynterm structure itself does not consume any significant amount of memory at rest.
There are a number of ways to configure and control the terms that get indexed. Minimizing the amount of noise and unwanted terms in the index improves the performance. You can perform some of these improvements in components such as CFS, which process the data earlier in the indexing pipeline. In the Content component, the
IndexNumbers (and related settings) and
ProperNames configuration parameters are the more common options available to adjust the terms that you index. You can use the
TermGetAll action to help analyze what terms are in the IDOL index. As an alternative to using the action, you can also view information about terms in the indexed data on the Terms tab on the Performance page in the Monitor section of IDOL Admin.
dynterm directory also contains the unstemmed structure, which stores the full unstemmed version of terms in a memory mapped structure. Its memory and disk usage is therefore of the same order as that of the unique unstemmed terms, though the structure optimizes prefix matching, and so it stores common prefixes more efficiently. IDOL Server uses the unstemmed structure for wildcard searches, spelling correction, and fuzzy expansion.
Like the dynterm, minimizing the amount of data in the unstemmed index can improve performance. The
UnstemmedMinDocOccs parameter allows you to filter out very rare terms from the unstemmed index (although for some applications, such as legal search this might not be appropriate).
UnstemmedIndexNumbers and related settings can also prevent the unstemmed index from filling up with purely numeric or alphanumeric terms.
Index fields generate data for the dynterm and unstemmed structures. You should configure fields as Index fields if they contain data that requires conceptual searching, such as the body of an e-mail. It is typically better to store a highly structured field (like a document date) as one of the optimized types if it is required for search, and to make use of a FieldText or metadata search parameter. For example:
In the case of a document date, you can use
NumericDateType field properties.
For more information about the
dynterm storage mode, see Repository Storage Mode.
TermCache memory is transient, representing the memory in use by the server threads as they load the
dynterm information during query processing. This transient size depends on the terms that you query for.
nodetable structure and directory stores both metadata information about each document section, and the physical representation of each document. IDOL Server stores the metadata information (for example, date, database, or fieldcheck) in a memory mapped structure, with each section consuming 64 bytes. The physical representation is approximately equal in size to the original IDX or XML format.
The nodetable metadata is primarily used for fast filtering checks, such as
IDOL Server primarily uses the physical representation to print results. It can also be used to perform unoptimized FieldText matching.
IDOL Server uses the nodetable data to perform a
MATCH FieldText search on a field that is not specified as any other type.
Micro Focus generally recommends that you do not use this process, and it usually means that IDOL must perform a large number of loads from disk to find the documents that you want. Equally, do not make all fields that contain a number
NumericType if you only use them when printing the document content.
NodeTableStoreContent parameter and the
StoredType property allow you to choose which fields to store in the index. You must store the content for some functionality, such as AQG. You can still use highlighting and summarization functionality even if the content is not stored locally in the index, by sending the data to highlight or summarize back to the server. You can use the
Regenerate settings only for fields that are
You can use the
NodeTableCompression configuration parameter to compress the documents in the nodetable on disk. In this case, IDOL Server compresses data in the nodetable directory before storing it, reducing the IDOL Server disk footprint.
numeric structure gives a fast lookup of numeric value to document ID. Each numeric value stored uses approximately 16 bytes. A numeric field is normally wholly memory mapped, but you can limit the memory by using the
NumericNormalMaxMem property for a field that is also
NumericType. Making a field
NumericType greatly speeds up the FieldText operators
BIAS, and the geospatial FieldText operators
BIASDISTSPHERICAL. Sorting on a numeric field is also optimized.
NumericDateType fields build an optimized index, which converts a date into an internal numeric
autndate format and stores this in the numeric structure. Its memory usage is identical to a numeric field. This optimizes the
RANGE FieldText specifiers, and sorting on that field.
Match structure is a wrapper to the numeric structure, used for MatchType fields. Each unique value is mapped to an integer and then indexed into a numeric structure.
The memory requirement for
Match is equal to
Numeric, with the addition of a value mapping. The value mapping size is proportional to the size of all unique match values that have been indexed. You can limit the memory usage for a
MatchType field by also using the
parametric structure is used to store an optimized lookup of document ID to values in that document. It includes a value mapping and an explicit document file.
The value mapping contains details of the values that the parametric fields in your index contain.The value mapping is wholly memory mapped and its size is proportional to the size of all unique parametric values that have been indexed into the server.
The explicit document file contains information about the parametric fields and values that occur in each document. You can use the
ParametricMemoryMaxSize parameter to memory limit the explicit document file. Otherwise, it is mapped into memory for performance. Each parametric value in a document uses up to 8 bytes of storage in the explicit document file.
sort structure stores an index for each configured
SortType field to optimize sorting on those values. A
SortType field uses slightly more than
SortFieldStorageLength bytes per document, irrespective of whether the document has a value or if the value is smaller than the configured storage length. Micro Focus recommends that you use
SortType fields if:
Most of your documents have a value in the
Most of the values have a common prefix that you can remove by using
The values (after removing any common prefix) all fit in the configured
For other cases it is typically better to use a
MatchType value if you commonly use the field for sorting.