Open topic with navigation
When you configure your IDOL Server index, you must consider the terms that you want to search for, and the terms that you want to index. Numeric and alphanumeric terms can take up a large proportion of an index, so it is important to consider what you need to search for, and configure your index appropriately.
You can view information on the number of numeric and alphanumeric terms in your data index on the Terms tab on the Performance page in the Monitor section of IDOL Admin.
The following table outlines some of the configuration parameters that are useful to consider. For more details, refer to the IDOL Server Reference.
||Split numeric and alphanumeric terms and store the chunks as separate terms.|
||The maximum length of a purely numeric terms. Longer numeric terms are truncated before indexing.|
||The maximum length of alphanumeric terms. Longer alphanumeric terms are truncated before indexing.|
||The maximum number of characters per chunk when splitting a purely numeric term.|
||The maximum number of characters per chunk when splitting an alphanumeric term.|
||The maximum value of a numeric term that you want to index.|
||Whether to add numeric terms to the unstemmed index.|
||The maximum length of non-numeric values to add to the unstemmed index.|
||The maximum length of purely numeric values to add to the unstemmed index.|
||The maximum length of alphanumeric values to add to the unstemmed index.|
||The method to use to handle numbers during indexing. See IndexNumbers.|
||Restricts the per-language
||A field property that specifies that certain fields must use the per-field
||The maximum length of purely numeric terms to index for fields with this property.|
||The maximum length of alphanumeric terms to index for fields with this property.|
Truncation applies globally across all fields to ensure consistency at query time.
At index time, IDOL Server runs the following process to determine how to index numeric and alphanumeric terms:
IDOL determines whether the term is non-numeric, numeric, or alphanumeric. According to your per-language
IndexNumbers configuration setting, it discards any terms that you do not want to index.
IDOL discards pure numeric terms that are greater than
IndexNumbersMaxValue, and uses the field property settings to discard terms longer than the relevant
IndexNumbersNMaxLength, and terms that do not match the
IndexNumbers setting for the field.
IDOL truncates numeric and alphanumeric terms according to the values of the
SplitNumbers is set to
True, IDOL splits numeric terms into chunks of
NumericTermChunkSize, and alphanumeric terms into chunks of
You can set
-1, which means that it does not split terms for that type. For example, you might want to split numbers only for purely numeric terms, in which case you can set
IDOL adds normal terms to the unstemmed index, and indexed into the dynterm index. In the dynterm index, it stems the terms and then truncates terms to
TermSize. Numeric and alphanumeric terms are never stemmed.
Terms that have been chunked by
SplitNumbers do not get indexed into the unstemmed index (even when they are short enough to be a single chunk).
If you have set
AlphaNumericTermChunkSize to -1, the associated terms are indexed into the unstemmed index.
Terms longer than
UnstemmedIndexNumbersNMaxLength are not added to the unstemmed index.
When you run queries for numeric and alphanumeric terms, the results depend on your configuration. In general:
you cannot search for any term that is discarded at index time (similar to stop words). For example, if you set
0, you cannot search for any numeric or alphanumeric terms.
you cannot use wildcards to search for any term that is not added to the unstemmed index. For example, if you set
True, numeric terms are split into chunks, and are not added to the unstemmed index, so you cannot use wildcards to search for them. You can search for the exact term as normal.
TermGetInfo action returns information about the term chunks that an alphanumeric term is split into. However, when you send a query, you can search only for the number in its entirety, and not for the individual term chunks.
When you decide how you want to configure your IDOL Server to handle numeric and alphanumeric data, you must consider whether you need to search for these values at all, and whether you want to use wildcards to search for them.
In many cases, you do not need to use wildcards to search for numbers. For example, you might have invoice numbers that do not have any special significance except for the order. You might want to be able to search for a specific invoice, but you usually know the exact number that you want to find. If you never want to use wildcards to search for values, you can use
UnstemmedIndexNumbers to prevent IDOL Server from storing the unstemmed terms.
If you index spreadsheets, it might add lots of terms to the unstemmed index that you never need to search for with wildcards. In this case, you might want to disable unstemmed indexing of numbers, and you might also want to use
SplitNumbers to reduce the total number of numeric terms that IDOL indexes.
For numbers that you search for regularly, you might want to use Eduction to extract the number to a field. You can then use FieldText operators to search for the numbers. For example, if you want to search for ranges of invoice numbers, you can use the
RANGE operator. You can use the NumericType field property for the field to optimize these operations.
The following example shows the configuration options for a particular scenario, and the results of indexing various numeric terms into an IDOL Server with this configuration.
[Server] SplitNumbers=True IndexNumbers1TruncateLength=12 IndexNumbers2TruncateLength=0 NumericTermChunkSize=7 AlphaNumericTermChunkSize=-1 IndexNumbersMaxValue=0 // (explicit default) UnstemmedIndexNumbers0MaxLength=-1 // (explicit default) UnstemmedIndexNumbers1MaxLength=-1 // (explicit default) UnstemmedIndexNumbers2MaxLength=-1 // (explicit default) [MyLanguage] IndexNumbers=1 [ No field-specific overrides ]
a123456789. Indexed without modification, returnable in a wildcard search.
1234. Indexed as a single SplitNumber chunk
12341Z, not returnable in a wildcard search.
12345678. Indexed as two SplitNumber chunks:
23456781, not returnable in a wildcard search.
123456789012. Indexed as two SplitNumber chunks:
67890121, not returnable in a wildcard search.
123456789012000000. Truncated and indexed as two SplitNumber chunks:
67890121, not returnable in a wildcard search.
A123456789B123456789C123456789D12345678. Indexed without modification, returnable in a wildcard search.
A123456789B123456789C123456789D123456789E123456789. Truncated and indexed as
A123456789B123456789C123456789D123456789 (that is, based on
TermSize=40), returnable in some wildcard searches (for example,
A* but not