Numeric and Alphanumeric Terms

When you configure your IDOL Server index, you must consider the terms that you want to search for, and the terms that you want to index. Numeric and alphanumeric terms can take up a large proportion of an index, so it is important to consider what you need to search for, and configure your index appropriately.

TIP:

You can view information on the number of numeric and alphanumeric terms in your data index on the Terms tab on the Performance page in the Monitor section of IDOL Admin.

Configuration Parameters

The following table outlines some of the configuration parameters that are useful to consider. For more details, refer to the IDOL Server Reference.

Parameter Section Description
SplitNumbers [Server] Split numeric and alphanumeric terms and store the chunks as separate terms.
IndexNumbers1TruncateLength [Server] The maximum length of a purely numeric terms. Longer numeric terms are truncated before indexing.
IndexNumbers2TruncateLength [Server] The maximum length of alphanumeric terms. Longer alphanumeric terms are truncated before indexing.
NumericTermChunkSize [Server] The maximum number of characters per chunk when splitting a purely numeric term.
AlphaNumericTermChunkSize [Server] The maximum number of characters per chunk when splitting an alphanumeric term.
IndexNumbersMaxValue [Server] The maximum value of a numeric term that you want to index.
UnstemmedIndexNumbers [Server] Whether to add numeric terms to the unstemmed index.
UnstemmedIndexNumbers0MaxLength [Server] The maximum length of non-numeric values to add to the unstemmed index.
UnstemmedIndexNumbers1MaxLength [Server] The maximum length of purely numeric values to add to the unstemmed index.
UnstemmedIndexNumbers2MaxLength [Server] The maximum length of alphanumeric values to add to the unstemmed index.
IndexNumbers [MyLanguage] The method to use to handle numbers during indexing. See IndexNumbers.
IndexNumbers [MyFieldProperty] Restricts the per-language IndexNumbers setting for a specific field or fields with the IndexNumbersType property.
IndexNumbersType [MyFieldProperty] A field property that specifies that certain fields must use the per-field IndexNumbers setting.
IndexNumbers1MaxLength [MyFieldProperty] The maximum length of purely numeric terms to index for fields with this property.
IndexNumbers2MaxLength [MyFieldProperty] The maximum length of alphanumeric terms to index for fields with this property.
NOTE:

Truncation applies globally across all fields to ensure consistency at query time.

Index Process

At index time, IDOL Server runs the following process to determine how to index numeric and alphanumeric terms:

  1. IDOL determines whether the term is non-numeric, numeric, or alphanumeric. According to your per-language IndexNumbers configuration setting, it discards any terms that you do not want to index.

  2. IDOL discards pure numeric terms that are greater than IndexNumbersMaxValue, and uses the field property settings to discard terms longer than the relevant IndexNumbersNMaxLength, and terms that do not match the IndexNumbers setting for the field.

  3. IDOL truncates numeric and alphanumeric terms according to the values of the IndexNumbersNTruncateLength parameters.

  4. If SplitNumbers is set to True, IDOL splits numeric terms into chunks of NumericTermChunkSize, and alphanumeric terms into chunks of AlphaNumericTermChunkSize.

    TIP:

    You can set NumericTermChunkSize or AlphaNumericTermChunkSize to -1, which means that it does not split terms for that type. For example, you might want to split numbers only for purely numeric terms, in which case you can set AlphaNumericTermChunkSize to -1.

  5. IDOL adds normal terms to the unstemmed index, and indexed into the dynterm index. In the dynterm index, it stems the terms and then truncates terms to TermSize. Numeric and alphanumeric terms are never stemmed.

Queries

When you run queries for numeric and alphanumeric terms, the results depend on your configuration. In general:

When you decide how you want to configure your IDOL Server to handle numeric and alphanumeric data, you must consider whether you need to search for these values at all, and whether you want to use wildcards to search for them.

In many cases, you do not need to use wildcards to search for numbers. For example, you might have invoice numbers that do not have any special significance except for the order. You might want to be able to search for a specific invoice, but you usually know the exact number that you want to find. If you never want to use wildcards to search for values, you can use UnstemmedIndexNumbers to prevent IDOL Server from storing the unstemmed terms.

If you index spreadsheets, it might add lots of terms to the unstemmed index that you never need to search for with wildcards. In this case, you might want to disable unstemmed indexing of numbers, and you might also want to use SplitNumbers to reduce the total number of numeric terms that IDOL indexes.

For numbers that you search for regularly, you might want to use Eduction to extract the number to a field. You can then use FieldText operators to search for the numbers. For example, if you want to search for ranges of invoice numbers, you can use the RANGE operator. You can use the NumericType field property for the field to optimize these operations.

Example

The following example shows the configuration options for a particular scenario, and the results of indexing various numeric terms into an IDOL Server with this configuration.

[Server]
SplitNumbers=True
IndexNumbers1TruncateLength=12
IndexNumbers2TruncateLength=0
NumericTermChunkSize=7
AlphaNumericTermChunkSize=-1
IndexNumbersMaxValue=0              // (explicit default)
UnstemmedIndexNumbers0MaxLength=-1  // (explicit default)
UnstemmedIndexNumbers1MaxLength=-1  // (explicit default)
UnstemmedIndexNumbers2MaxLength=-1  // (explicit default)

[MyLanguage]
IndexNumbers=1

[ No field-specific overrides ]

_FT_HTML5_bannerTitle.htm