Improve IDOL Performance > Improve Query Performance > FieldText Optimizations

FieldText Optimizations

This section describes some of the main field properties in IDOL, and explains the memory usage and performance implications for using each for FieldText queries.

It is not always obvious which field to use, and often it is worth making compromises, depending on your limitations. For example, you might not need to optimize a particular field if users do not often search for information in that field, or if slightly slower performance is acceptable when they do. At other times, it might be acceptable to have a large number of unique indexed terms because the additional memory and disk usage is worth the improved query performance.

This page provides some general information, but as with all functionality, you must decide the best approach for your application, and test accordingly.

NOTE:

The following operations are listed approximately in speed order. However, the speed of any approach depends on the data, and how you choose your fields.

The most important consideration is that the IDOL Content component processes the Text part of the Query before the FieldText. That is, the Text part of the query reduces the number of FieldText operations that IDOL must perform.

IDOL has some special optimizations for FieldText-only queries with NumericType and MatchType fields. However, in general, if most of your queries use Text=* with a FieldText operation, you are probably not getting the best possible performance from the server.

TIP:

You can use the PerformanceAnalysis parameter for the Query action to check the performance of a particular query. For more information, see Improve Query Performance.

Index Fields

The Index field property is intended for fields that contain prose, writing, and general human-generated content. Index fields contain information that you frequently query, and that you want to query by using operations such as exact phrase search, Wildcard search, and Boolean expressions.

The performance of this type of field depends on the number of unique terms. In practice, index fields generally have the best performance, as long as your indexed data falls within reasonable limitations. For English language content, when you index numbers, you generally expect the number of unique terms to stop increasing after a few million (depending on the number of alphanumeric values in your data).

Memory At rest memory usage is proportional to the number of unique terms. During queries, the memory usage also depends on the size of the query terms.
Disk Disk usage is proportional to the number of terms (that is, the number of unique terms and the number of times they occur).
Indexing Indexing performance decreases a little as the number of unique terms gets very large.
Query Query performance is more or less independent of the number of unique terms. Querying for a large number of terms generally has the most negative effect on performance, but querying for very common terms can also have a negative impact. Each term in the query requires two disk reads: one to get the meta information for the term, and one to get the occurrence information.

FieldCheckType Fields

FieldCheckType fields are designed to be a very fast match for very frequent restrictions. You can use this type for only one field in each document, and you can match against only one value (you cannot use OR).

Memory Memory usage is not affected by this setting.
Disk Disk usage is not affected by this setting.
Indexing Indexing performance is not affected by this setting.
Query Query performance is extremely fast, and is largely unaffected by the data distribution. However, field check is faster if you use it as a fast filter in conjunction with other restrictions than as your only query restriction. For example, action=Query&Text=*&FieldCheck=value is not as quick as Query&FieldText=MATCH{value}:FieldCheckField if FieldCheckField is also MatchType.

NumericType

The NumericType property is for fields that exclusively contain numbers. This property optimizes the EQUAL, GREATER, LESS, and NRANGE FieldText operators. It also optimizes sorting by the values in this field.

Memory Memory usage is proportional to the number of fields, and the number of unique values in each field. You can limit the memory usage for each field by using the NumericNormalMaxMem field property.
Disk Disk usage is proportional to the number of fields, and the number of unique values.
Indexing Indexing performance generally decreases proportionally to the number of NumericType fields.
Query Query performance is typically very good for an even distribution of numeric values.

MatchType

The MatchType property is for fields that you query for in their entirety. This property optimizes the MATCH FieldText operator, and improves performance for the WILD and STRING operators. It also optimizes sorting by the values in this field.

The performance of this field type is highest when the values that you match against do not occur in a large number of the documents. For example, if you search for an exact match that occurs in 90% of documents, the search is not particularly fast.

Memory Memory usage is proportional to the number of fields and the number of unique values in each field. The MatchType property is based on NumericType, which means that you can limit the memory usage for each field by using the NumericNormalMaxMem field property. For MatchType fields, this property limits the number of values that IDOL stores in memory. However, you cannot limit the memory usage for the mappings
Disk Disk usage is proportional to the number of fields and the number of unique values.
Indexing Indexing performance generally decreases proportionally to the number of MatchType fields.
Query Query performance is generally fast when the values that you match against do not occur in a large proportion of the documents that you are searching.

No Type

Any fields that do not have a specific type are ordinary fields. You can use this for fields that are not frequently queried, or when the performance does not matter.

By default, you can use any FieldText operation against a field without a type. The performance of non-optimized FieldText queries depends on the operation that you request. You can also prevent FieldText operations against fields that are not optimized for that operation by using the DisallowNonOptimizedFieldText configuration parameter.

Memory Memory usage is not affected.
Disk Disk usage is not affected, except for the storage of the field value.
Indexing Indexing performance is not affected by this setting.
Query Query performance for non-optimized FieldText operations depends on the operation. If you set DisallowNonOptimizedFieldText to True, you cannot perform FieldText operations for these fields.

_HP_HTML5_bannerTitle.htm