Determine the Tokens to Index

After the text is converted into a series of tokens, another set of rules determines the tokens that are added to the index (that is, the tokens that you can search for).

At this point, you consider the following types of token:

Numeric and Alphanumeric Terms

Main Topic: Numeric and Alphanumeric Terms

You can configure IDOL Server to index or discard numeric and alphanumeric terms. For example, you might want to be able to search for phone numbers in your documents. Alternatively, the documents might contain a lot of numeric references that are also stored in other fields, which you do not need to search for in the body of the document.

The IndexNumbers configuration parameters allow you to configure indexing for numeric and alphanumeric terms.

Stop Words

There are many extremely common words that do not add any meaning to sentences. For example, in English the words the and and often occur multiple times in a paragraph.

In IDOL Server, these words are known as stop words. You can define a stop word list, which contains a list of these common words. IDOL Server uses this list, and does not index any of the stop words.

The IDOL Server installation contains stop lists for the most commonly used languages. You can modify these stop lists for your organization. For example, you might have company specific words that occur in all or most of your documents. You can add these words to the stop list, and IDOL Server does not index them.

If you are using a language that does not have a default stop list, you can also create one.