When you configure your IDOL Server index, you must consider the terms that you want to search for, and the terms that you want to index. Numeric and alphanumeric terms can take up a large proportion of an index, so it is important to consider what you need to search for, and configure your index appropriately.
You can view information on the number of numeric and alphanumeric terms in your data index on the Terms tab on the Performance page in the Monitor section of IDOL Admin.
The following table outlines some of the configuration parameters that are useful to consider. For more details, refer to the IDOL Server Reference.
Parameter | Section | Description |
---|---|---|
SplitNumbers
|
[Server]
|
Split numeric and alphanumeric terms and store the chunks as separate terms. |
IndexNumbers1TruncateLength
|
[Server]
|
The maximum length of a purely numeric terms. Longer numeric terms are truncated before indexing. |
IndexNumbers2TruncateLength
|
[Server]
|
The maximum length of alphanumeric terms. Longer alphanumeric terms are truncated before indexing. |
NumericTermChunkSize
|
[Server]
|
The maximum number of characters per chunk when splitting a purely numeric term. |
AlphaNumericTermChunkSize
|
[Server]
|
The maximum number of characters per chunk when splitting an alphanumeric term. |
IndexNumbersMaxValue
|
[Server]
|
The maximum value of a numeric term that you want to index. |
UnstemmedIndexNumbers
|
[Server]
|
Whether to add numeric terms to the unstemmed index. |
UnstemmedIndexNumbers0MaxLength
|
[Server]
|
The maximum length of non-numeric values to add to the unstemmed index. |
UnstemmedIndexNumbers1MaxLength
|
[Server]
|
The maximum length of purely numeric values to add to the unstemmed index. |
UnstemmedIndexNumbers2MaxLength
|
[Server]
|
The maximum length of alphanumeric values to add to the unstemmed index. |
IndexNumbers
|
[MyLanguage]
|
The method to use to handle numbers during indexing. See IndexNumbers. |
IndexNumbers
|
[MyFieldProperty]
|
Restricts the per-language IndexNumbers setting for a specific field or fields with the IndexNumbersType property. |
IndexNumbersType
|
[MyFieldProperty]
|
A field property that specifies that certain fields must use the per-field IndexNumbers setting. |
IndexNumbers1MaxLength
|
[MyFieldProperty]
|
The maximum length of purely numeric terms to index for fields with this property. |
IndexNumbers2MaxLength
|
[MyFieldProperty]
|
The maximum length of alphanumeric terms to index for fields with this property. |
Truncation applies globally across all fields to ensure consistency at query time.
At index time, IDOL Server runs the following process to determine how to index numeric and alphanumeric terms:
IDOL determines whether the term is non-numeric, numeric, or alphanumeric. According to your per-language IndexNumbers
configuration setting, it discards any terms that you do not want to index.
IDOL discards pure numeric terms that are greater than IndexNumbersMaxValue
, and uses the field property settings to discard terms longer than the relevant IndexNumbersNMaxLength
, and terms that do not match the IndexNumbers
setting for the field.
IDOL truncates numeric and alphanumeric terms according to the values of the IndexNumbersNTruncateLength
parameters.
If SplitNumbers
is set to True
, IDOL splits numeric terms into chunks of NumericTermChunkSize
, and alphanumeric terms into chunks of AlphaNumericTermChunkSize
.
You can set NumericTermChunkSize
or AlphaNumericTermChunkSize
to -1
, which means that it does not split terms for that type. For example, you might want to split numbers only for purely numeric terms, in which case you can set AlphaNumericTermChunkSize
to -1
.
IDOL adds normal terms to the unstemmed index, and indexed into the dynterm index. In the dynterm index, it stems the terms and then truncates terms to TermSize
. Numeric and alphanumeric terms are never stemmed.
Terms that have been chunked by SplitNumbers
do not get indexed into the unstemmed index (even when they are short enough to be a single chunk).
If you have set NumericTermChunkSize
or AlphaNumericTermChunkSize
to -1, the associated terms are indexed into the unstemmed index.
Terms longer than UnstemmedIndexNumbersNMaxLength
are not added to the unstemmed index.
When you run queries for numeric and alphanumeric terms, the results depend on your configuration. In general:
you cannot search for any term that is discarded at index time (similar to stop words). For example, if you set IndexNumbers
to 0
, you cannot search for any numeric or alphanumeric terms.
you cannot use wildcards to search for any term that is not added to the unstemmed index. For example, if you set SplitNumbers
to True
, numeric terms are split into chunks, and are not added to the unstemmed index, so you cannot use wildcards to search for them. You can search for the exact term as normal.
The TermGetInfo
action returns information about the term chunks that an alphanumeric term is split into. However, when you send a query, you can search only for the number in its entirety, and not for the individual term chunks.
When you decide how you want to configure your IDOL Server to handle numeric and alphanumeric data, you must consider whether you need to search for these values at all, and whether you want to use wildcards to search for them.
In many cases, you do not need to use wildcards to search for numbers. For example, you might have invoice numbers that do not have any special significance except for the order. You might want to be able to search for a specific invoice, but you usually know the exact number that you want to find. If you never want to use wildcards to search for values, you can use UnstemmedIndexNumbers
to prevent IDOL Server from storing the unstemmed terms.
If you index spreadsheets, it might add lots of terms to the unstemmed index that you never need to search for with wildcards. In this case, you might want to disable unstemmed indexing of numbers, and you might also want to use SplitNumbers
to reduce the total number of numeric terms that IDOL indexes.
For numbers that you search for regularly, you might want to use Eduction to extract the number to a field. You can then use FieldText operators to search for the numbers. For example, if you want to search for ranges of invoice numbers, you can use the RANGE
operator. You can use the NumericType field property for the field to optimize these operations.
The following example shows the configuration options for a particular scenario, and the results of indexing various numeric terms into an IDOL Server with this configuration.
[Server] SplitNumbers=True IndexNumbers1TruncateLength=12 IndexNumbers2TruncateLength=0 NumericTermChunkSize=7 AlphaNumericTermChunkSize=-1 IndexNumbersMaxValue=0 // (explicit default) UnstemmedIndexNumbers0MaxLength=-1 // (explicit default) UnstemmedIndexNumbers1MaxLength=-1 // (explicit default) UnstemmedIndexNumbers2MaxLength=-1 // (explicit default) [MyLanguage] IndexNumbers=1 [ No field-specific overrides ]
a123456789
. Indexed without modification, returnable in a wildcard search.
1234
. Indexed as a single SplitNumber chunk 12341Z
, not returnable in a wildcard search.
12345678
. Indexed as two SplitNumber chunks: 12Z
, 23456781
, not returnable in a wildcard search.
123456789012
. Indexed as two SplitNumber chunks: 123452Z
, 67890121
, not returnable in a wildcard search.
123456789012000000
. Truncated and indexed as two SplitNumber chunks: 123452Z
, 67890121
, not returnable in a wildcard search.
A123456789B123456789C123456789D12345678
. Indexed without modification, returnable in a wildcard search.
A123456789B123456789C123456789D123456789E123456789
. Truncated and indexed as A123456789B123456789C123456789D123456789
(that is, based on TermSize=40
), returnable in some wildcard searches (for example, A*
but not *789
).
|