You can use the results of the TermGetAll
action to diagnose query performance problems. For example:
As an alternative to using actions, you can also use the Terms tab on the Performance page in the Monitor section of IDOL Admin to view information about the frequency and type of terms in your data index.
If a query for a single term is slow, it might indicate that the term occurs in a large number of documents. Use TermGetAll
to Find the Most Common Terms in IDOL Server.
If your term is high in the list of document occurrences, or total occurrences, consider how useful the term is for queries. You can add terms that are not useful for queries to your stop list.
The default stop list contains language terms that occur so frequently that they do not convey much meaning. For example, in English the words the and and can be removed without losing meaning.
Your data might include other terms that occur so commonly that they are not very useful for queries. For example, if you have indexed a database of local business telephone numbers, you might add the area code to the stop list, because it occurs in all the documents and is not useful for querying.
Alternatively, a slow query for a single term might indicate that a field is configured as an Index type field, when it does not need to be. If you query only the whole value of the field, you can configure the field as MatchType.
In a wildcard query, IDOL Server first finds all the terms that match the wildcard value that you specify. It then finds the documents that contain these terms.
Wildcard queries can be slow because the wildcard term expands to a large number of values. Along with useful expansions, it might include:
spelling errors or other terms that occur only once or twice in your data.
numeric or mixed alphanumeric terms.
terms from fields that are inappropriate for this type of searching.
To find the terms that a particular wildcard query returns, you can use the TermExpand
action, with the Text
parameter set to your wildcard value, and Expansion
set to Wild
.
You can use the TermGetAll
action TermAnalysis
to find out where you have a large number of terms.
If <autn:dococcs logn="0">
is large, there are a large number of terms that occur only in one document. This number might include rare terms, spelling mistakes, or identification text strings that you do not want to use for querying.
If <autn:numeric>
or <autn:alphanumeric>
are large, there are a large number of numeric or alphanumeric terms.
If you never want to query numeric terms, consider changing your IndexNumbers configuration parameter to exclude numeric terms from your index.
To improve the performance of wildcard queries, you can use several configuration parameters to tune how IDOL Server treats terms of different types.
If you have a large number of terms that occur in only one or two documents, you can use the following configuration parameter to improve query performance:
UnstemmedMinDocOccs
If you have a large number of purely numeric or alphanumeric terms, you can use the following configuration parameters to improve performance:
IndexNumbersMaxValue
IndexNumbersNMaxLength
IndexNumbersNTruncateLength
SplitNumbers
During the index process, IDOL Server forms a representation of each document, and the fields and terms it contains. If the indexing speed is too slow, it might be because you have too many terms in your documents.
To improve your indexing performance in these cases, you can reduce the number of terms that IDOL Server indexes.
In addition to the configuration parameters discussed in Wildcard Queries Are Slow, you can also use the following configuration parameter to reduce the number of indexed terms:
IndexNumbers
|